2B-4B Models

53 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.

Model Data VRAM min VRAM optimal Best local GPU Cloud fallback Detail
Gemma 3 4B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 3 4B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Gemma 3 4B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Gemma 3 4B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Gemma 3n E4B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 3n E4B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Gemma 3n E4B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Gemma 3n E4B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen 4B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen 4B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen 4B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen 4B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen3 4B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 4B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen3 4B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen3 4B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 4B CLOUD Estimated 6GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 4B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 VL 4B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 4B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 4B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Phi-3 3.8B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Phi-3 3.8B Q4 Estimated 4GB 6GB RTX 3090 24GB A6000 48GB Open
Phi-3 3.8B Q5 Estimated 5GB 8GB RTX 3090 24GB A6000 48GB Open
Phi-3 3.8B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Falcon 3 3B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Falcon 3 3B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Falcon 3 3B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Falcon 3 3B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 3B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Granite 3.1 MoE 3B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 3B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 3B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Llama 3.2 3B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Llama 3.2 3B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Llama 3.2 3B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Llama 3.2 3B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 3B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 3B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 3B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 3B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 3B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 Coder 3B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 3B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 3B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 VL 3B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 VL 3B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 VL 3B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 VL 3B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
StarCoder2 3B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
StarCoder2 3B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
StarCoder2 3B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
StarCoder2 3B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Back to all groups Use VRAM calculator Run large models on RunPod Try Vast.ai fallback

We may earn a commission if you click links on this page.