Sub-2B Models

115 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.

Model Data VRAM min VRAM optimal Best local GPU Cloud fallback Detail
CodeGemma 2B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
CodeGemma 2B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
CodeGemma 2B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
CodeGemma 2B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Gemma 2 2B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 2 2B Q4 Estimated 2GB 4GB RTX 3090 24GB A6000 48GB Open
Gemma 2 2B Q5 Estimated 3GB 6GB RTX 3090 24GB A6000 48GB Open
Gemma 2 2B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Gemma 2B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 2B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Gemma 2B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Gemma 2B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Gemma 3n E2B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 3n E2B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Gemma 3n E2B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Gemma 3n E2B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 2B CLOUD Estimated 4GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 2B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 VL 2B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 2B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 2B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen 1.8B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen 1.8B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen 1.8B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen 1.8B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen3 1.7B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 1.7B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen3 1.7B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen3 1.7B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
SmolLM2 1.7B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
SmolLM2 1.7B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
SmolLM2 1.7B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
SmolLM2 1.7B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 1.5B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 1.5B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 1.5B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 1.5B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen2 1.5B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2 1.5B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen2 1.5B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen2 1.5B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 1.5B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 1.5B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 1.5B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 1.5B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 1.5B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 Coder 1.5B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 1.5B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 1.5B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
DeepSeek Coder 1.3B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek Coder 1.3B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
DeepSeek Coder 1.3B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
DeepSeek Coder 1.3B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
TinyLlama 1.1B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
TinyLlama 1.1B Q4 Estimated 2GB 4GB RTX 3090 24GB A6000 48GB Open
TinyLlama 1.1B Q5 Estimated 3GB 6GB RTX 3090 24GB A6000 48GB Open
TinyLlama 1.1B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Falcon 3 1B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Falcon 3 1B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Falcon 3 1B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Falcon 3 1B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Gemma 3 1B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 3 1B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Gemma 3 1B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Gemma 3 1B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 1B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Granite 3.1 MoE 1B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 1B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 1B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Llama 3.2 1B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Llama 3.2 1B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Llama 3.2 1B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Llama 3.2 1B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen3 0.6B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 0.6B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen3 0.6B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen3 0.6B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
BGE-M3 567M FP16 Estimated 4GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen 0.5B FP16 Estimated 12GB 24GB RTX 3090 24GB A6000 48GB Open
Qwen 0.5B Q4 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Qwen 0.5B Q5 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen 0.5B Q8 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2 0.5B FP16 Estimated 12GB 24GB RTX 3090 24GB A6000 48GB Open
Qwen2 0.5B Q4 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Qwen2 0.5B Q5 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen2 0.5B Q8 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 0.5B FP16 Estimated 12GB 24GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 0.5B Q4 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 0.5B Q5 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 0.5B Q8 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 0.5B FP16 Estimated 12GB 24GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 0.5B Q4 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 0.5B Q5 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 0.5B Q8 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
SmolLM2 360M FP16 Estimated 12GB 24GB RTX 3090 24GB A6000 48GB Open
SmolLM2 360M Q4 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
SmolLM2 360M Q5 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
SmolLM2 360M Q8 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
MXBAI Embed Large 335M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Snowflake Arctic Embed 335M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Gemma 3 270M FP16 Estimated 12GB 24GB RTX 3090 24GB A6000 48GB Open
Gemma 3 270M Q4 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Gemma 3 270M Q5 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Gemma 3 270M Q8 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Nomic Embed Text 137M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Snowflake Arctic Embed 137M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
SmolLM2 135M FP16 Estimated 12GB 24GB RTX 3090 24GB A6000 48GB Open
SmolLM2 135M Q4 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
SmolLM2 135M Q5 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
SmolLM2 135M Q8 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Snowflake Arctic Embed 110M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
All-MiniLM 33M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Snowflake Arctic Embed 33M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
All-MiniLM 22M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Snowflake Arctic Embed 22M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Back to all groups Use VRAM calculator Run large models on RunPod Try Vast.ai fallback

We may earn a commission if you click links on this page.