250B+ Models

26 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.

Model Data VRAM min VRAM optimal Best local GPU Cloud fallback Detail
Llama 4 128X17B FP16 Estimated 430GB 442GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Llama 4 128X17B Q4 Estimated 418GB 428GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Llama 4 128X17B Q5 Estimated 420GB 430GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Llama 4 128X17B Q8 Estimated 424GB 434GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-R1 671B FP16 Estimated 430GB 442GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-R1 671B Q4 Estimated 418GB 428GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-R1 671B Q5 Estimated 420GB 430GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-R1 671B Q8 Estimated 424GB 434GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-V3 671B FP16 Estimated 430GB 442GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-V3 671B Q4 Estimated 418GB 428GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-V3 671B Q5 Estimated 420GB 430GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-V3 671B Q8 Estimated 424GB 434GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 Coder 480B CLOUD Estimated 215GB 223GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 Coder 480B FP16 Estimated 225GB 237GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 Coder 480B Q4 Estimated 213GB 223GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 Coder 480B Q5 Estimated 215GB 225GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 Coder 480B Q8 Estimated 219GB 229GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Llama 3.1 405B FP16 Estimated 225GB 237GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Llama 3.1 405B Q4 Estimated 213GB 223GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Llama 3.1 405B Q5 Estimated 215GB 225GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Llama 3.1 405B Q8 Estimated 219GB 229GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3.5 397B-A17B CLOUD Estimated 215GB 223GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Llama 4 16X17B FP16 Estimated 225GB 237GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Llama 4 16X17B Q4 Estimated 213GB 223GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Llama 4 16X17B Q5 Estimated 215GB 225GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Llama 4 16X17B Q8 Estimated 219GB 229GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Back to all groups Use VRAM calculator Run large models on RunPod Try Vast.ai fallback

We may earn a commission if you click links on this page.