35B-72B Models

56 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.

Model Data VRAM min VRAM optimal Best local GPU Cloud fallback Detail
Qwen 72B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen 72B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
Qwen 72B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen 72B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2 72B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2 72B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2 72B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2 72B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2.5 72B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2.5 72B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 72B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2.5 72B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2.5 VL 72B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2.5 VL 72B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 VL 72B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2.5 VL 72B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
CodeLlama 70B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
CodeLlama 70B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
CodeLlama 70B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
CodeLlama 70B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-R1 70B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-R1 70B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 70B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-R1 70B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
Llama 2 70B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Llama 2 70B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
Llama 2 70B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
Llama 2 70B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
Llama 3 70B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Llama 3 70B Q4 Estimated 24GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Llama 3 70B Q5 Estimated 30GB 32GB RTX 6000 Ada 48GB A100 80GB Open
Llama 3 70B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
Llama 3.1 70B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Llama 3.1 70B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
Llama 3.1 70B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
Llama 3.1 70B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
Llama 3.3 70B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Llama 3.3 70B Q4 Estimated 24GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Llama 3.3 70B Q5 Estimated 30GB 32GB RTX 6000 Ada 48GB A100 80GB Open
Llama 3.3 70B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-R1 67B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-R1 67B Q4 Estimated 24GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 67B Q5 Estimated 30GB 32GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 67B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-V3 67B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-V3 67B Q4 Estimated 24GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-V3 67B Q5 Estimated 30GB 32GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-V3 67B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
Mixtral 8X7B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Mixtral 8X7B Q4 Estimated 20GB 24GB RTX 3090 24GB A6000 48GB Open
Mixtral 8X7B Q5 Estimated 24GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Mixtral 8X7B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen3.5 35B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen3.5 35B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3.5 35B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen3.5 35B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
Back to all groups Use VRAM calculator Run large models on RunPod Try Vast.ai fallback

We may earn a commission if you click links on this page.