Qwen2.5 14B Q5

Popular Ollama model family: Qwen2.5. Caveat: Estimated values are placeholders unless marked measured..

Hardware Snapshot

Family	Qwen2.5
Scenario	coding
License scope	open-source
Quantization	Q5
VRAM minimum	12GB
VRAM optimal	22GB
Best local GPU	RTX 3090 24GB
Cloud fallback	A6000 48GB
Updated	2026-02-24
Data status	Verified by Real Hardware
Ollama source	Library reference (verified: 2026-02-24)
Ollama tag	`qwen2.5:14b`
Category	coding

Benchmark Anchors

Hardware	Expected tok/s
RTX 3090 24GB	18.9
RTX 4090 24GB	25.5
A100 80GB	45.4

Real Hardware Benchmark (RTX 3090)

Tokens/s	83.952
Latency	946 ms
Prompt tokens	50
Eval tokens	53
Test time	2026-04-29T05:39:58Z
GPU model	NVIDIA GeForce RTX 3090

Verified by real hardware.

View raw nvidia-smi snapshot

Performance Curve

Reference anchors are baseline estimates. Measured RTX 3090 data is overlaid when available.

Best Hardware for Qwen2.5 14B Q5

Local run: RTX 3090 (24GB) (Check latest deal) for around 83.952 tok/s on this profile.
Cloud run: RunPod A6000 48GB , about 0.5x the local 3090 speed anchor.
Alternative cloud: Vast.ai options for flexible spot pricing.

Local vs Cloud Cost Hint

Mode	40h / month	120h / month
Local power only (3090 baseline)	$2.24	$6.72
A6000 48GB	$30.4	$91.2

Related Model Profiles

Qwen2.5 14B Q4 10GB min, 20GB optimal
Qwen2.5 14B Q8 16GB min, 26GB optimal
Qwen2.5 14B FP16 22GB min, 34GB optimal

ollama run qwen2.5:14b More coding models More 14b-class models Benchmark changelog Submit your test result Check local GPU upgrade

We may earn a commission if you click links on this page.