Sub-2B Models
115 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.
| Model | Data | VRAM min | VRAM optimal | Best local GPU | Cloud fallback | Detail |
|---|---|---|---|---|---|---|
| CodeGemma 2B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| CodeGemma 2B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| CodeGemma 2B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| CodeGemma 2B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 2 2B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 2 2B Q4 | Estimated | 2GB | 4GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 2 2B Q5 | Estimated | 3GB | 6GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 2 2B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 2B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 2B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 2B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 2B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3n E2B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 3n E2B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3n E2B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3n E2B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 2B CLOUD | Estimated | 4GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 2B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 VL 2B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 2B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 2B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 1.8B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen 1.8B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 1.8B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 1.8B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 1.7B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 1.7B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 1.7B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 1.7B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 1.7B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| SmolLM2 1.7B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 1.7B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 1.7B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 1.5B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 1.5B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 1.5B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 1.5B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 1.5B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2 1.5B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 1.5B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 1.5B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 1.5B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 1.5B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 1.5B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 1.5B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 1.5B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 Coder 1.5B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 1.5B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 1.5B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek Coder 1.3B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek Coder 1.3B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek Coder 1.3B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek Coder 1.3B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| TinyLlama 1.1B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| TinyLlama 1.1B Q4 | Estimated | 2GB | 4GB | RTX 3090 24GB | A6000 48GB | Open |
| TinyLlama 1.1B Q5 | Estimated | 3GB | 6GB | RTX 3090 24GB | A6000 48GB | Open |
| TinyLlama 1.1B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 1B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Falcon 3 1B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 1B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 1B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 1B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 3 1B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 1B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 1B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Granite 3.1 MoE 1B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Granite 3.1 MoE 1B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Granite 3.1 MoE 1B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Granite 3.1 MoE 1B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Llama 3.2 1B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Llama 3.2 1B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Llama 3.2 1B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Llama 3.2 1B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 0.6B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 0.6B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 0.6B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 0.6B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| BGE-M3 567M FP16 | Estimated | 4GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 0.5B FP16 | Estimated | 12GB | 24GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 0.5B Q4 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 0.5B Q5 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 0.5B Q8 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 0.5B FP16 | Estimated | 12GB | 24GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 0.5B Q4 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 0.5B Q5 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 0.5B Q8 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 0.5B FP16 | Estimated | 12GB | 24GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 0.5B Q4 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 0.5B Q5 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 0.5B Q8 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 0.5B FP16 | Estimated | 12GB | 24GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 0.5B Q4 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 0.5B Q5 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 0.5B Q8 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 360M FP16 | Estimated | 12GB | 24GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 360M Q4 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 360M Q5 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 360M Q8 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| MXBAI Embed Large 335M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Snowflake Arctic Embed 335M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 270M FP16 | Estimated | 12GB | 24GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 270M Q4 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 270M Q5 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 270M Q8 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Nomic Embed Text 137M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Snowflake Arctic Embed 137M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 135M FP16 | Estimated | 12GB | 24GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 135M Q4 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 135M Q5 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 135M Q8 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Snowflake Arctic Embed 110M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| All-MiniLM 33M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Snowflake Arctic Embed 33M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| All-MiniLM 22M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Snowflake Arctic Embed 22M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
We may earn a commission if you click links on this page.