35B-72B Models

56 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.

Model	Data	VRAM min	VRAM optimal	Best local GPU	Cloud fallback	Detail
Qwen 72B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen 72B Q4	Estimated	38GB	48GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen 72B Q5	Estimated	40GB	50GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen 72B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen2 72B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen2 72B Q4	Estimated	38GB	48GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen2 72B Q5	Estimated	40GB	50GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen2 72B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen2.5 72B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen2.5 72B Q4	Estimated	38GB	48GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen2.5 72B Q5	Estimated	40GB	50GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen2.5 72B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen2.5 VL 72B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen2.5 VL 72B Q4	Estimated	38GB	48GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen2.5 VL 72B Q5	Estimated	40GB	50GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen2.5 VL 72B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
CodeLlama 70B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
CodeLlama 70B Q4	Estimated	38GB	48GB	RTX 6000 Ada 48GB	A100 80GB	Open
CodeLlama 70B Q5	Estimated	40GB	50GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
CodeLlama 70B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
DeepSeek-R1 70B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
DeepSeek-R1 70B Q4	Estimated	38GB	48GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-R1 70B Q5	Estimated	40GB	50GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
DeepSeek-R1 70B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Llama 2 70B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Llama 2 70B Q4	Estimated	38GB	48GB	RTX 6000 Ada 48GB	A100 80GB	Open
Llama 2 70B Q5	Estimated	40GB	50GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Llama 2 70B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Llama 3 70B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Llama 3 70B Q4	Estimated	24GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
Llama 3 70B Q5	Estimated	30GB	32GB	RTX 6000 Ada 48GB	A100 80GB	Open
Llama 3 70B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Llama 3.1 70B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Llama 3.1 70B Q4	Estimated	38GB	48GB	RTX 6000 Ada 48GB	A100 80GB	Open
Llama 3.1 70B Q5	Estimated	40GB	50GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Llama 3.1 70B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Llama 3.3 70B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Llama 3.3 70B Q4	Estimated	24GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
Llama 3.3 70B Q5	Estimated	30GB	32GB	RTX 6000 Ada 48GB	A100 80GB	Open
Llama 3.3 70B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
DeepSeek-R1 67B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
DeepSeek-R1 67B Q4	Estimated	24GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-R1 67B Q5	Estimated	30GB	32GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-R1 67B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
DeepSeek-V3 67B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
DeepSeek-V3 67B Q4	Estimated	24GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-V3 67B Q5	Estimated	30GB	32GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-V3 67B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Mixtral 8X7B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Mixtral 8X7B Q4	Estimated	20GB	24GB	RTX 3090 24GB	A6000 48GB	Open
Mixtral 8X7B Q5	Estimated	24GB	26GB	RTX 6000 Ada 48GB	A100 80GB	Open
Mixtral 8X7B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen3.5 35B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen3.5 35B Q4	Estimated	38GB	48GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen3.5 35B Q5	Estimated	40GB	50GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen3.5 35B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open

Back to all groups Use VRAM calculator Run large models on RunPod Try Vast.ai fallback

We may earn a commission if you click links on this page.