2B-4B Models
53 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.
| Model | Data | VRAM min | VRAM optimal | Best local GPU | Cloud fallback | Detail |
|---|---|---|---|---|---|---|
| Gemma 3 4B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 3 4B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 4B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 4B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3n E4B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 3n E4B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3n E4B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3n E4B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 4B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen 4B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 4B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 4B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 4B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 4B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 4B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 4B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 4B CLOUD | Estimated | 6GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 4B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 VL 4B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 4B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 4B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Phi-3 3.8B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Phi-3 3.8B Q4 | Estimated | 4GB | 6GB | RTX 3090 24GB | A6000 48GB | Open |
| Phi-3 3.8B Q5 | Estimated | 5GB | 8GB | RTX 3090 24GB | A6000 48GB | Open |
| Phi-3 3.8B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 3B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Falcon 3 3B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 3B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 3B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Granite 3.1 MoE 3B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Granite 3.1 MoE 3B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Granite 3.1 MoE 3B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Granite 3.1 MoE 3B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Llama 3.2 3B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Llama 3.2 3B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Llama 3.2 3B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Llama 3.2 3B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 3B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 3B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 3B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 3B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 3B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 Coder 3B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 3B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 3B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 VL 3B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 VL 3B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 VL 3B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 VL 3B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| StarCoder2 3B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| StarCoder2 3B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| StarCoder2 3B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| StarCoder2 3B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
We may earn a commission if you click links on this page.