9B-14B Models

72 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.

Model Data VRAM min VRAM optimal Best local GPU Cloud fallback Detail
DeepSeek-R1 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Ministral 3 14B FP16 Measured 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Ministral 3 14B Q4 Measured 10GB 20GB RTX 3090 24GB A6000 48GB Open
Ministral 3 14B Q5 Measured 12GB 22GB RTX 3090 24GB A6000 48GB Open
Ministral 3 14B Q8 Measured 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Phi-3 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Phi-3 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Phi-3 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Phi-3 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Phi-4 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Phi-4 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Phi-4 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Phi-4 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Phi-4 Reasoning 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Phi-4 Reasoning 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Phi-4 Reasoning 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Phi-4 Reasoning 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 Coder 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 Coder 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen3 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen3 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
CodeLlama 13B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
CodeLlama 13B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
CodeLlama 13B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
CodeLlama 13B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Llama 2 13B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Llama 2 13B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Llama 2 13B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Llama 2 13B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
LLaVA 13B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
LLaVA 13B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
LLaVA 13B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
LLaVA 13B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
OLMo 2 13B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
OLMo 2 13B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
OLMo 2 13B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
OLMo 2 13B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 3 12B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 3 12B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Gemma 3 12B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Gemma 3 12B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Mistral Nemo 12B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Mistral Nemo 12B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Mistral Nemo 12B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Mistral Nemo 12B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Llama 3.2 Vision 11B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Llama 3.2 Vision 11B Q4 Estimated 12GB 14GB RTX 3090 24GB A6000 48GB Open
Llama 3.2 Vision 11B Q5 Estimated 14GB 16GB RTX 3090 24GB A6000 48GB Open
Llama 3.2 Vision 11B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Falcon 3 10B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Falcon 3 10B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Falcon 3 10B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Falcon 3 10B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 2 9B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 2 9B Q4 Estimated 10GB 12GB RTX 3090 24GB A6000 48GB Open
Gemma 2 9B Q5 Estimated 12GB 14GB RTX 3090 24GB A6000 48GB Open
Gemma 2 9B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Back to all groups Use VRAM calculator Run large models on RunPod Try Vast.ai fallback

We may earn a commission if you click links on this page.