Reasoning Models

98 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.

Model Data VRAM min VRAM optimal Best local GPU Cloud fallback Detail
DeepSeek-R1 671B FP16 Estimated 430GB 442GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-R1 671B Q4 Estimated 418GB 428GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-R1 671B Q5 Estimated 420GB 430GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-R1 671B Q8 Estimated 424GB 434GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-V3 671B FP16 Estimated 430GB 442GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-V3 671B Q4 Estimated 418GB 428GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-V3 671B Q5 Estimated 420GB 430GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-V3 671B Q8 Estimated 424GB 434GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3.5 122B FP16 Estimated 150GB 162GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3.5 122B Q4 Estimated 138GB 148GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3.5 122B Q5 Estimated 140GB 150GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3.5 122B Q8 Estimated 144GB 154GB Cloud-first (no practical single-GPU local) H100/H200 class Open
GPT-OSS 120B CLOUD Estimated 70GB 78GB Dual RTX 4090 (model parallel) A100 80GB Open
GPT-OSS 120B FP16 Estimated 80GB 92GB Cloud-first (no practical single-GPU local) H100/H200 class Open
GPT-OSS 120B Q4 Estimated 68GB 78GB Dual RTX 4090 (model parallel) A100 80GB Open
GPT-OSS 120B Q5 Estimated 70GB 80GB Dual RTX 4090 (model parallel) A100 80GB Open
GPT-OSS 120B Q8 Estimated 74GB 84GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-R1 70B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-R1 70B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 70B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-R1 70B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-R1 67B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-R1 67B Q4 Estimated 24GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 67B Q5 Estimated 30GB 32GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 67B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-V3 67B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-V3 67B Q4 Estimated 24GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-V3 67B Q5 Estimated 30GB 32GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-V3 67B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen3.5 35B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen3.5 35B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3.5 35B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen3.5 35B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-R1 32B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 32B Q4 Measured 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 32B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 32B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
QwQ 32B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
QwQ 32B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
QwQ 32B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
QwQ 32B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Magistral 24B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
Magistral 24B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Magistral 24B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Magistral 24B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
GPT-OSS 20B CLOUD Estimated 20GB 28GB RTX 6000 Ada 48GB A100 80GB Open
GPT-OSS 20B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
GPT-OSS 20B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
GPT-OSS 20B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
GPT-OSS 20B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Phi-3 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Phi-3 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Phi-3 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Phi-3 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Phi-4 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Phi-4 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Phi-4 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Phi-4 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Phi-4 Reasoning 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Phi-4 Reasoning 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Phi-4 Reasoning 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Phi-4 Reasoning 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
OLMo 2 13B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
OLMo 2 13B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
OLMo 2 13B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
OLMo 2 13B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 8B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 8B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 8B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 8B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
OLMo 2 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
OLMo 2 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
OLMo 2 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
OLMo 2 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Phi-3 3.8B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Phi-3 3.8B Q4 Estimated 4GB 6GB RTX 3090 24GB A6000 48GB Open
Phi-3 3.8B Q5 Estimated 5GB 8GB RTX 3090 24GB A6000 48GB Open
Phi-3 3.8B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 3B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Granite 3.1 MoE 3B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 3B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 3B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 1.5B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 1.5B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 1.5B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 1.5B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 1B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Granite 3.1 MoE 1B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 1B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 1B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Back to all groups Use VRAM calculator Run large models on RunPod Try Vast.ai fallback

We may earn a commission if you click links on this page.