Reasoning Models

98 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.

Model	Data	VRAM min	VRAM optimal	Best local GPU	Cloud fallback	Detail
DeepSeek-R1 671B FP16	Estimated	430GB	442GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
DeepSeek-R1 671B Q4	Estimated	418GB	428GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
DeepSeek-R1 671B Q5	Estimated	420GB	430GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
DeepSeek-R1 671B Q8	Estimated	424GB	434GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
DeepSeek-V3 671B FP16	Estimated	430GB	442GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
DeepSeek-V3 671B Q4	Estimated	418GB	428GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
DeepSeek-V3 671B Q5	Estimated	420GB	430GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
DeepSeek-V3 671B Q8	Estimated	424GB	434GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Qwen3.5 122B FP16	Estimated	150GB	162GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Qwen3.5 122B Q4	Estimated	138GB	148GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Qwen3.5 122B Q5	Estimated	140GB	150GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Qwen3.5 122B Q8	Estimated	144GB	154GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
GPT-OSS 120B CLOUD	Estimated	70GB	78GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
GPT-OSS 120B FP16	Estimated	80GB	92GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
GPT-OSS 120B Q4	Estimated	68GB	78GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
GPT-OSS 120B Q5	Estimated	70GB	80GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
GPT-OSS 120B Q8	Estimated	74GB	84GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
DeepSeek-R1 70B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
DeepSeek-R1 70B Q4	Estimated	38GB	48GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-R1 70B Q5	Estimated	40GB	50GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
DeepSeek-R1 70B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
DeepSeek-R1 67B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
DeepSeek-R1 67B Q4	Estimated	24GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-R1 67B Q5	Estimated	30GB	32GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-R1 67B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
DeepSeek-V3 67B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
DeepSeek-V3 67B Q4	Estimated	24GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-V3 67B Q5	Estimated	30GB	32GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-V3 67B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen3.5 35B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen3.5 35B Q4	Estimated	38GB	48GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen3.5 35B Q5	Estimated	40GB	50GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen3.5 35B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
DeepSeek-R1 32B FP16	Estimated	30GB	42GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-R1 32B Q4	Measured	18GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-R1 32B Q5	Estimated	20GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-R1 32B Q8	Estimated	24GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
QwQ 32B FP16	Estimated	30GB	42GB	RTX 6000 Ada 48GB	A100 80GB	Open
QwQ 32B Q4	Estimated	18GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
QwQ 32B Q5	Estimated	20GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
QwQ 32B Q8	Estimated	24GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
Magistral 24B FP16	Estimated	30GB	42GB	RTX 6000 Ada 48GB	A100 80GB	Open
Magistral 24B Q4	Estimated	18GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
Magistral 24B Q5	Estimated	20GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
Magistral 24B Q8	Estimated	24GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
GPT-OSS 20B CLOUD	Estimated	20GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
GPT-OSS 20B FP16	Estimated	30GB	42GB	RTX 6000 Ada 48GB	A100 80GB	Open
GPT-OSS 20B Q4	Estimated	18GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
GPT-OSS 20B Q5	Estimated	20GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
GPT-OSS 20B Q8	Estimated	24GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-R1 14B FP16	Estimated	22GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-R1 14B Q4	Estimated	10GB	20GB	RTX 3090 24GB	A6000 48GB	Open
DeepSeek-R1 14B Q5	Estimated	12GB	22GB	RTX 3090 24GB	A6000 48GB	Open
DeepSeek-R1 14B Q8	Estimated	16GB	26GB	RTX 6000 Ada 48GB	A100 80GB	Open
Phi-3 14B FP16	Estimated	22GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
Phi-3 14B Q4	Estimated	10GB	20GB	RTX 3090 24GB	A6000 48GB	Open
Phi-3 14B Q5	Estimated	12GB	22GB	RTX 3090 24GB	A6000 48GB	Open
Phi-3 14B Q8	Estimated	16GB	26GB	RTX 6000 Ada 48GB	A100 80GB	Open
Phi-4 14B FP16	Estimated	22GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
Phi-4 14B Q4	Estimated	10GB	20GB	RTX 3090 24GB	A6000 48GB	Open
Phi-4 14B Q5	Estimated	12GB	22GB	RTX 3090 24GB	A6000 48GB	Open
Phi-4 14B Q8	Estimated	16GB	26GB	RTX 6000 Ada 48GB	A100 80GB	Open
Phi-4 Reasoning 14B FP16	Estimated	22GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
Phi-4 Reasoning 14B Q4	Estimated	10GB	20GB	RTX 3090 24GB	A6000 48GB	Open
Phi-4 Reasoning 14B Q5	Estimated	12GB	22GB	RTX 3090 24GB	A6000 48GB	Open
Phi-4 Reasoning 14B Q8	Estimated	16GB	26GB	RTX 6000 Ada 48GB	A100 80GB	Open
OLMo 2 13B FP16	Estimated	22GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
OLMo 2 13B Q4	Estimated	10GB	20GB	RTX 3090 24GB	A6000 48GB	Open
OLMo 2 13B Q5	Estimated	12GB	22GB	RTX 3090 24GB	A6000 48GB	Open
OLMo 2 13B Q8	Estimated	16GB	26GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-R1 8B FP16	Estimated	18GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-R1 8B Q4	Estimated	6GB	16GB	RTX 3090 24GB	A6000 48GB	Open
DeepSeek-R1 8B Q5	Estimated	8GB	18GB	RTX 3090 24GB	A6000 48GB	Open
DeepSeek-R1 8B Q8	Estimated	12GB	22GB	RTX 3090 24GB	A6000 48GB	Open
DeepSeek-R1 7B FP16	Estimated	18GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-R1 7B Q4	Estimated	6GB	16GB	RTX 3090 24GB	A6000 48GB	Open
DeepSeek-R1 7B Q5	Estimated	8GB	18GB	RTX 3090 24GB	A6000 48GB	Open
DeepSeek-R1 7B Q8	Estimated	12GB	22GB	RTX 3090 24GB	A6000 48GB	Open
OLMo 2 7B FP16	Estimated	18GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
OLMo 2 7B Q4	Estimated	6GB	16GB	RTX 3090 24GB	A6000 48GB	Open
OLMo 2 7B Q5	Estimated	8GB	18GB	RTX 3090 24GB	A6000 48GB	Open
OLMo 2 7B Q8	Estimated	12GB	22GB	RTX 3090 24GB	A6000 48GB	Open
Phi-3 3.8B FP16	Estimated	16GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
Phi-3 3.8B Q4	Estimated	4GB	6GB	RTX 3090 24GB	A6000 48GB	Open
Phi-3 3.8B Q5	Estimated	5GB	8GB	RTX 3090 24GB	A6000 48GB	Open
Phi-3 3.8B Q8	Estimated	10GB	20GB	RTX 3090 24GB	A6000 48GB	Open
Granite 3.1 MoE 3B FP16	Estimated	16GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
Granite 3.1 MoE 3B Q4	Estimated	4GB	14GB	RTX 3090 24GB	A6000 48GB	Open
Granite 3.1 MoE 3B Q5	Estimated	6GB	16GB	RTX 3090 24GB	A6000 48GB	Open
Granite 3.1 MoE 3B Q8	Estimated	10GB	20GB	RTX 3090 24GB	A6000 48GB	Open
DeepSeek-R1 1.5B FP16	Estimated	14GB	26GB	RTX 6000 Ada 48GB	A100 80GB	Open
DeepSeek-R1 1.5B Q4	Estimated	2GB	12GB	RTX 3090 24GB	A6000 48GB	Open
DeepSeek-R1 1.5B Q5	Estimated	4GB	14GB	RTX 3090 24GB	A6000 48GB	Open
DeepSeek-R1 1.5B Q8	Estimated	8GB	18GB	RTX 3090 24GB	A6000 48GB	Open
Granite 3.1 MoE 1B FP16	Estimated	14GB	26GB	RTX 6000 Ada 48GB	A100 80GB	Open
Granite 3.1 MoE 1B Q4	Estimated	2GB	12GB	RTX 3090 24GB	A6000 48GB	Open
Granite 3.1 MoE 1B Q5	Estimated	4GB	14GB	RTX 3090 24GB	A6000 48GB	Open
Granite 3.1 MoE 1B Q8	Estimated	8GB	18GB	RTX 3090 24GB	A6000 48GB	Open

Back to all groups Use VRAM calculator Run large models on RunPod Try Vast.ai fallback

We may earn a commission if you click links on this page.