35B-72B Models
56 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.
| Model | Data | VRAM min | VRAM optimal | Best local GPU | Cloud fallback | Detail |
|---|---|---|---|---|---|---|
| Qwen 72B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen 72B Q4 | Estimated | 38GB | 48GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen 72B Q5 | Estimated | 40GB | 50GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen 72B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2 72B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2 72B Q4 | Estimated | 38GB | 48GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2 72B Q5 | Estimated | 40GB | 50GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2 72B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2.5 72B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2.5 72B Q4 | Estimated | 38GB | 48GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 72B Q5 | Estimated | 40GB | 50GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2.5 72B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2.5 VL 72B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2.5 VL 72B Q4 | Estimated | 38GB | 48GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 VL 72B Q5 | Estimated | 40GB | 50GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen2.5 VL 72B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| CodeLlama 70B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| CodeLlama 70B Q4 | Estimated | 38GB | 48GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| CodeLlama 70B Q5 | Estimated | 40GB | 50GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| CodeLlama 70B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| DeepSeek-R1 70B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| DeepSeek-R1 70B Q4 | Estimated | 38GB | 48GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 70B Q5 | Estimated | 40GB | 50GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| DeepSeek-R1 70B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Llama 2 70B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Llama 2 70B Q4 | Estimated | 38GB | 48GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Llama 2 70B Q5 | Estimated | 40GB | 50GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Llama 2 70B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Llama 3 70B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Llama 3 70B Q4 | Estimated | 24GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Llama 3 70B Q5 | Estimated | 30GB | 32GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Llama 3 70B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Llama 3.1 70B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Llama 3.1 70B Q4 | Estimated | 38GB | 48GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Llama 3.1 70B Q5 | Estimated | 40GB | 50GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Llama 3.1 70B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Llama 3.3 70B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Llama 3.3 70B Q4 | Estimated | 24GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Llama 3.3 70B Q5 | Estimated | 30GB | 32GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Llama 3.3 70B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| DeepSeek-R1 67B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| DeepSeek-R1 67B Q4 | Estimated | 24GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 67B Q5 | Estimated | 30GB | 32GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 67B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| DeepSeek-V3 67B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| DeepSeek-V3 67B Q4 | Estimated | 24GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-V3 67B Q5 | Estimated | 30GB | 32GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-V3 67B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Mixtral 8X7B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Mixtral 8X7B Q4 | Estimated | 20GB | 24GB | RTX 3090 24GB | A6000 48GB | Open |
| Mixtral 8X7B Q5 | Estimated | 24GB | 26GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Mixtral 8X7B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen3.5 35B FP16 | Estimated | 50GB | 62GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen3.5 35B Q4 | Estimated | 38GB | 48GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3.5 35B Q5 | Estimated | 40GB | 50GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
| Qwen3.5 35B Q8 | Estimated | 44GB | 54GB | Dual RTX 4090 (model parallel) | A100 80GB | Open |
We may earn a commission if you click links on this page.