Open-Source Models
471 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.
| Model | Data | VRAM min | VRAM optimal | Best local GPU | Cloud fallback | Detail |
|---|---|---|---|---|---|---|
| DeepSeek-R1 671B FP16 | Estimated | 430GB | 442GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| DeepSeek-R1 671B Q4 | Estimated | 418GB | 428GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| DeepSeek-R1 671B Q5 | Estimated | 420GB | 430GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| DeepSeek-R1 671B Q8 | Estimated | 424GB | 434GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| DeepSeek-V3 671B FP16 | Estimated | 430GB | 442GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| DeepSeek-V3 671B Q4 | Estimated | 418GB | 428GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| DeepSeek-V3 671B Q5 | Estimated | 420GB | 430GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| DeepSeek-V3 671B Q8 | Estimated | 424GB | 434GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3 Coder 480B CLOUD | Estimated | 215GB | 223GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3 Coder 480B FP16 | Estimated | 225GB | 237GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3 Coder 480B Q4 | Estimated | 213GB | 223GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3 Coder 480B Q5 | Estimated | 215GB | 225GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3 Coder 480B Q8 | Estimated | 219GB | 229GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3.5 397B-A17B CLOUD | Estimated | 215GB | 223GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| DeepSeek Coder V2 236B FP16 | Estimated | 150GB | 162GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| DeepSeek Coder V2 236B Q4 | Estimated | 138GB | 148GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| DeepSeek Coder V2 236B Q5 | Estimated | 140GB | 150GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| DeepSeek Coder V2 236B Q8 | Estimated | 144GB | 154GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3 235B FP16 | Estimated | 150GB | 162GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3 235B Q4 | Estimated | 138GB | 148GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3 235B Q5 | Estimated | 140GB | 150GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3 235B Q8 | Estimated | 144GB | 154GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3 VL 235B CLOUD | Estimated | 140GB | 148GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3 VL 235B FP16 | Estimated | 150GB | 162GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3 VL 235B Q4 | Estimated | 138GB | 148GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3 VL 235B Q5 | Estimated | 140GB | 150GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3 VL 235B Q8 | Estimated | 144GB | 154GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Mixtral 8X22B FP16 | Estimated | 150GB | 162GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Mixtral 8X22B Q4 | Estimated | 138GB | 148GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Mixtral 8X22B Q5 | Estimated | 140GB | 150GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Mixtral 8X22B Q8 | Estimated | 144GB | 154GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3.5 122B FP16 | Estimated | 150GB | 162GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3.5 122B Q4 | Estimated | 138GB | 148GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3.5 122B Q5 | Estimated | 140GB | 150GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen3.5 122B Q8 | Estimated | 144GB | 154GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen 110B FP16 | Estimated | 80GB | 92GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen 110B Q4 | Estimated | 68GB | 78GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen 110B Q5 | Estimated | 70GB | 80GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen 110B Q8 | Estimated | 74GB | 84GB | Cloud-first (no practical single-GPU local) | H100/H200 class | Open |
| Qwen 72B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen 72B Q4 | Estimated | 38GB | 48GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen 72B Q5 | Estimated | 40GB | 50GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen 72B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2 72B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2 72B Q4 | Estimated | 38GB | 48GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2 72B Q5 | Estimated | 40GB | 50GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2 72B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2.5 72B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2.5 72B Q4 | Estimated | 38GB | 48GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 72B Q5 | Estimated | 40GB | 50GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2.5 72B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2.5 VL 72B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2.5 VL 72B Q4 | Estimated | 38GB | 48GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 VL 72B Q5 | Estimated | 40GB | 50GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2.5 VL 72B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| DeepSeek-R1 70B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| DeepSeek-R1 70B Q4 | Estimated | 38GB | 48GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 70B Q5 | Estimated | 40GB | 50GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| DeepSeek-R1 70B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| DeepSeek-R1 67B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| DeepSeek-R1 67B Q4 | Estimated | 24GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 67B Q5 | Estimated | 30GB | 32GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 67B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| DeepSeek-V3 67B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| DeepSeek-V3 67B Q4 | Estimated | 24GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-V3 67B Q5 | Estimated | 30GB | 32GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-V3 67B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Mixtral 8X7B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Mixtral 8X7B Q4 | Estimated | 20GB | 24GB | RTX 3090 24GB | A6000 48GB | Open |
| Mixtral 8X7B Q5 | Estimated | 24GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Mixtral 8X7B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen3.5 35B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen3.5 35B Q4 | Estimated | 38GB | 48GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3.5 35B Q5 | Estimated | 40GB | 50GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen3.5 35B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| LLaVA 34B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| LLaVA 34B Q4 | Estimated | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| LLaVA 34B Q5 | Estimated | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| LLaVA 34B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek Coder 33B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek Coder 33B Q4 | Estimated | 16GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek Coder 33B Q5 | Estimated | 20GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek Coder 33B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 32B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 32B Q4 | Measured | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 32B Q5 | Estimated | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 32B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen 32B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen 32B Q4 | Estimated | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen 32B Q5 | Estimated | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen 32B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 32B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 32B Q4 | Estimated | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 32B Q5 | Estimated | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 32B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 Coder 32B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 Coder 32B Q4 | Estimated | 16GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 32B Q5 | Estimated | 20GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 32B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 VL 32B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 VL 32B Q4 | Estimated | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 VL 32B Q5 | Estimated | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 VL 32B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 32B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 32B Q4 | Measured | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 32B Q5 | Measured | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 32B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 VL 32B CLOUD | Estimated | 20GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 VL 32B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 VL 32B Q4 | Estimated | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 VL 32B Q5 | Estimated | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 VL 32B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| QwQ 32B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| QwQ 32B Q4 | Estimated | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| QwQ 32B Q5 | Estimated | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| QwQ 32B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Nemotron 3 Nano 30B FP16 | Measured | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Nemotron 3 Nano 30B Q4 | Measured | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Nemotron 3 Nano 30B Q5 | Measured | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Nemotron 3 Nano 30B Q8 | Measured | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 30B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 30B Q4 | Estimated | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 30B Q5 | Estimated | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 30B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 Coder 30B CLOUD | Estimated | 20GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 Coder 30B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 Coder 30B Q4 | Estimated | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 Coder 30B Q5 | Estimated | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 Coder 30B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 VL 30B CLOUD | Estimated | 20GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 VL 30B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 VL 30B Q4 | Estimated | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 VL 30B Q5 | Estimated | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 VL 30B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 2 27B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 2 27B Q4 | Estimated | 16GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 2 27B Q5 | Estimated | 20GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 2 27B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 3 27B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 3 27B Q4 | Estimated | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 3 27B Q5 | Estimated | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 3 27B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Translategemma 27B FP16 | Measured | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Translategemma 27B Q4 | Measured | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Translategemma 27B Q5 | Measured | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Translategemma 27B Q8 | Measured | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Magistral 24B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Magistral 24B Q4 | Estimated | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Magistral 24B Q5 | Estimated | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Magistral 24B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek Coder V2 16B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek Coder V2 16B Q4 | Estimated | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek Coder V2 16B Q5 | Estimated | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek Coder V2 16B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| StarCoder2 15B FP16 | Estimated | 30GB | 42GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| StarCoder2 15B Q4 | Estimated | 18GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| StarCoder2 15B Q5 | Estimated | 20GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| StarCoder2 15B Q8 | Estimated | 24GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 14B FP16 | Estimated | 22GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 14B Q4 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 14B Q5 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 14B Q8 | Estimated | 16GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Ministral 3 14B FP16 | Measured | 22GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Ministral 3 14B Q4 | Measured | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Ministral 3 14B Q5 | Measured | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Ministral 3 14B Q8 | Measured | 16GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Phi-3 14B FP16 | Estimated | 22GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Phi-3 14B Q4 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Phi-3 14B Q5 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Phi-3 14B Q8 | Estimated | 16GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Phi-4 14B FP16 | Estimated | 22GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Phi-4 14B Q4 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Phi-4 14B Q5 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Phi-4 14B Q8 | Estimated | 16GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Phi-4 Reasoning 14B FP16 | Estimated | 22GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Phi-4 Reasoning 14B Q4 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Phi-4 Reasoning 14B Q5 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Phi-4 Reasoning 14B Q8 | Estimated | 16GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen 14B FP16 | Estimated | 22GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen 14B Q4 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 14B Q5 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 14B Q8 | Estimated | 16GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 14B FP16 | Estimated | 22GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 14B Q4 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 14B Q5 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 14B Q8 | Estimated | 16GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 Coder 14B FP16 | Estimated | 22GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 Coder 14B Q4 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 14B Q5 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 14B Q8 | Estimated | 16GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 14B FP16 | Estimated | 22GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 14B Q4 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 14B Q5 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 14B Q8 | Estimated | 16GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| LLaVA 13B FP16 | Estimated | 22GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| LLaVA 13B Q4 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| LLaVA 13B Q5 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| LLaVA 13B Q8 | Estimated | 16GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| OLMo 2 13B FP16 | Estimated | 22GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| OLMo 2 13B Q4 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| OLMo 2 13B Q5 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| OLMo 2 13B Q8 | Estimated | 16GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 3 12B FP16 | Estimated | 22GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 3 12B Q4 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 12B Q5 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 12B Q8 | Estimated | 16GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Mistral Nemo 12B FP16 | Estimated | 22GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Mistral Nemo 12B Q4 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Mistral Nemo 12B Q5 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Mistral Nemo 12B Q8 | Estimated | 16GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Falcon 3 10B FP16 | Estimated | 22GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Falcon 3 10B Q4 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 10B Q5 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 10B Q8 | Estimated | 16GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 2 9B FP16 | Estimated | 22GB | 34GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 2 9B Q4 | Estimated | 10GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 2 9B Q5 | Estimated | 12GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 2 9B Q8 | Estimated | 16GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 8B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 8B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 8B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 8B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Dolphin 3 8B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Dolphin 3 8B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Dolphin 3 8B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Dolphin 3 8B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| LLaVA Llama3 8B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| LLaVA Llama3 8B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| LLaVA Llama3 8B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| LLaVA Llama3 8B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| MiniCPM-V 8B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| MiniCPM-V 8B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| MiniCPM-V 8B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| MiniCPM-V 8B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 8B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 8B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 8B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 8B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 8B CLOUD | Estimated | 8GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 8B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 VL 8B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 8B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 8B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| CodeGemma 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| CodeGemma 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| CodeGemma 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| CodeGemma 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Falcon 3 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Glm 4.7 Flash 7B FP16 | Measured | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Glm 4.7 Flash 7B Q4 | Measured | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Glm 4.7 Flash 7B Q5 | Measured | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Glm 4.7 Flash 7B Q8 | Measured | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| LLaVA 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| LLaVA 7B Q4 | Estimated | 8GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| LLaVA 7B Q5 | Estimated | 10GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| LLaVA 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Mistral 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Mistral 7B Q4 | Estimated | 8GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Mistral 7B Q5 | Estimated | 10GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Mistral 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| OLMo 2 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| OLMo 2 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| OLMo 2 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| OLMo 2 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| OpenHermes 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| OpenHermes 7B Q4 | Estimated | 8GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| OpenHermes 7B Q5 | Estimated | 10GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| OpenHermes 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 Coder 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 VL 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 VL 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 VL 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 VL 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| StarCoder2 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| StarCoder2 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| StarCoder2 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| StarCoder2 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Zephyr 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Zephyr 7B Q4 | Estimated | 8GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Zephyr 7B Q5 | Estimated | 10GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Zephyr 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek Coder 6.7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek Coder 6.7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek Coder 6.7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek Coder 6.7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 4B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 3 4B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 4B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 4B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3n E4B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 3n E4B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3n E4B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3n E4B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 4B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen 4B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 4B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 4B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 4B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 4B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 4B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 4B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 4B CLOUD | Estimated | 6GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 4B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 VL 4B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 4B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 4B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Phi-3 3.8B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Phi-3 3.8B Q4 | Estimated | 4GB | 6GB | RTX 3090 24GB | A6000 48GB | Open |
| Phi-3 3.8B Q5 | Estimated | 5GB | 8GB | RTX 3090 24GB | A6000 48GB | Open |
| Phi-3 3.8B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 3B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Falcon 3 3B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 3B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 3B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Granite 3.1 MoE 3B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Granite 3.1 MoE 3B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Granite 3.1 MoE 3B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Granite 3.1 MoE 3B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 3B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 3B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 3B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 3B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 3B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 Coder 3B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 3B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 3B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 VL 3B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 VL 3B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 VL 3B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 VL 3B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| StarCoder2 3B FP16 | Estimated | 16GB | 28GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| StarCoder2 3B Q4 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| StarCoder2 3B Q5 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| StarCoder2 3B Q8 | Estimated | 10GB | 20GB | RTX 3090 24GB | A6000 48GB | Open |
| CodeGemma 2B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| CodeGemma 2B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| CodeGemma 2B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| CodeGemma 2B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 2 2B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 2 2B Q4 | Estimated | 2GB | 4GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 2 2B Q5 | Estimated | 3GB | 6GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 2 2B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 2B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 2B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 2B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 2B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3n E2B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 3n E2B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3n E2B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3n E2B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 2B CLOUD | Estimated | 4GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 2B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 VL 2B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 2B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 2B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 1.8B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen 1.8B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 1.8B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 1.8B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 1.7B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 1.7B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 1.7B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 1.7B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 1.7B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| SmolLM2 1.7B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 1.7B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 1.7B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 1.5B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 1.5B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 1.5B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 1.5B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 1.5B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2 1.5B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 1.5B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 1.5B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 1.5B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 1.5B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 1.5B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 1.5B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 1.5B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 Coder 1.5B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 1.5B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 1.5B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek Coder 1.3B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek Coder 1.3B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek Coder 1.3B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek Coder 1.3B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| TinyLlama 1.1B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| TinyLlama 1.1B Q4 | Estimated | 2GB | 4GB | RTX 3090 24GB | A6000 48GB | Open |
| TinyLlama 1.1B Q5 | Estimated | 3GB | 6GB | RTX 3090 24GB | A6000 48GB | Open |
| TinyLlama 1.1B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 1B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Falcon 3 1B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 1B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 1B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 1B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 3 1B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 1B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 1B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Granite 3.1 MoE 1B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Granite 3.1 MoE 1B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Granite 3.1 MoE 1B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Granite 3.1 MoE 1B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 0.6B FP16 | Estimated | 14GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 0.6B Q4 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 0.6B Q5 | Estimated | 4GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 0.6B Q8 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| BGE-M3 567M FP16 | Estimated | 4GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 0.5B FP16 | Estimated | 12GB | 24GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 0.5B Q4 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 0.5B Q5 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 0.5B Q8 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 0.5B FP16 | Estimated | 12GB | 24GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 0.5B Q4 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 0.5B Q5 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 0.5B Q8 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 0.5B FP16 | Estimated | 12GB | 24GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 0.5B Q4 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 0.5B Q5 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 0.5B Q8 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 0.5B FP16 | Estimated | 12GB | 24GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 0.5B Q4 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 0.5B Q5 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 0.5B Q8 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 360M FP16 | Estimated | 12GB | 24GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 360M Q4 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 360M Q5 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 360M Q8 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| MXBAI Embed Large 335M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Snowflake Arctic Embed 335M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 270M FP16 | Estimated | 12GB | 24GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 270M Q4 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 270M Q5 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 3 270M Q8 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Nomic Embed Text 137M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Snowflake Arctic Embed 137M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 135M FP16 | Estimated | 12GB | 24GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 135M Q4 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 135M Q5 | Estimated | 2GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| SmolLM2 135M Q8 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Snowflake Arctic Embed 110M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| All-MiniLM 33M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Snowflake Arctic Embed 33M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| All-MiniLM 22M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Snowflake Arctic Embed 22M FP16 | Estimated | 2GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
We may earn a commission if you click links on this page.