Best Coding Models for Local LLM Workflows

This guide prioritizes practical coding models you can run today with clear VRAM requirements. Ranking favors measured profiles first, then stable local fit under 24GB VRAM.

150

Coding profiles in catalog

17

Measured coding profiles

12

Local-first picks (<=24GB optimal)

8

Heavy picks (cloud-first)

Local-first coding picks (<=24GB optimal)

Model VRAM min/optimal 3090 tok/s Data Detail
Qwen3 8B Q4 6GB / 16GB 30 Measured Open
Qwen3 8B Q5 8GB / 18GB 27 Measured Open
Qwen2.5 Coder 32B Q4 16GB / 20GB 11 Measured Open
Qwen2.5 14B Q4 10GB / 20GB 21 Measured Open
Qwen2.5 Coder 32B Q5 20GB / 22GB 9.9 Measured Open
Qwen2.5 14B Q5 12GB / 22GB 18.9 Measured Open
Qwen3 8B Q8 12GB / 22GB 21.6 Measured Open
CodeLlama 7B Q4 8GB / 10GB 30 Estimated Open
Qwen2.5 0.5B Q4 2GB / 10GB 48 Estimated Open
Qwen2.5 Coder 0.5B Q4 2GB / 10GB 48 Estimated Open
CodeLlama 7B Q5 10GB / 12GB 27 Estimated Open
CodeGemma 2B Q4 2GB / 12GB 42 Estimated Open

Heavy coding models (cloud-first)

Model VRAM min/optimal Cloud fallback Detail
Qwen2.5 14B Q8 16GB / 26GB A100 80GB Open
Qwen3 Coder 30B Q4 18GB / 28GB A100 80GB Open
Qwen3 Coder 30B CLOUD 20GB / 28GB A100 80GB Open
Qwen3 Coder 30B Q5 20GB / 30GB A100 80GB Open
Qwen3 8B FP16 18GB / 30GB A100 80GB Open
Qwen2.5 Coder 32B Q8 24GB / 34GB A100 80GB Open
Qwen3 Coder 30B Q8 24GB / 34GB A100 80GB Open
Qwen2.5 14B FP16 22GB / 34GB A100 80GB Open
Estimate VRAM before deployment Open coding group hub Run-vs-rent decision guide