73B-250B Models

34 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.

Model Data VRAM min VRAM optimal Best local GPU Cloud fallback Detail
DeepSeek Coder V2 236B FP16 Estimated 150GB 162GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek Coder V2 236B Q4 Estimated 138GB 148GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek Coder V2 236B Q5 Estimated 140GB 150GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek Coder V2 236B Q8 Estimated 144GB 154GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 235B FP16 Estimated 150GB 162GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 235B Q4 Estimated 138GB 148GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 235B Q5 Estimated 140GB 150GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 235B Q8 Estimated 144GB 154GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 VL 235B CLOUD Estimated 140GB 148GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 VL 235B FP16 Estimated 150GB 162GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 VL 235B Q4 Estimated 138GB 148GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 VL 235B Q5 Estimated 140GB 150GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 VL 235B Q8 Estimated 144GB 154GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Mixtral 8X22B FP16 Estimated 150GB 162GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Mixtral 8X22B Q4 Estimated 138GB 148GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Mixtral 8X22B Q5 Estimated 140GB 150GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Mixtral 8X22B Q8 Estimated 144GB 154GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3.5 122B FP16 Estimated 150GB 162GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3.5 122B Q4 Estimated 138GB 148GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3.5 122B Q5 Estimated 140GB 150GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3.5 122B Q8 Estimated 144GB 154GB Cloud-first (no practical single-GPU local) H100/H200 class Open
GPT-OSS 120B CLOUD Estimated 70GB 78GB Dual RTX 4090 (model parallel) A100 80GB Open
GPT-OSS 120B FP16 Estimated 80GB 92GB Cloud-first (no practical single-GPU local) H100/H200 class Open
GPT-OSS 120B Q4 Estimated 68GB 78GB Dual RTX 4090 (model parallel) A100 80GB Open
GPT-OSS 120B Q5 Estimated 70GB 80GB Dual RTX 4090 (model parallel) A100 80GB Open
GPT-OSS 120B Q8 Estimated 74GB 84GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen 110B FP16 Estimated 80GB 92GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen 110B Q4 Estimated 68GB 78GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen 110B Q5 Estimated 70GB 80GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen 110B Q8 Estimated 74GB 84GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Llama 3.2 Vision 90B FP16 Estimated 80GB 92GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Llama 3.2 Vision 90B Q4 Estimated 68GB 78GB Dual RTX 4090 (model parallel) A100 80GB Open
Llama 3.2 Vision 90B Q5 Estimated 70GB 80GB Dual RTX 4090 (model parallel) A100 80GB Open
Llama 3.2 Vision 90B Q8 Estimated 74GB 84GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Back to all groups Use VRAM calculator Run large models on RunPod Try Vast.ai fallback

We may earn a commission if you click links on this page.