Open-Weight Models

10 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.

Model Data VRAM min VRAM optimal Best local GPU Cloud fallback Detail
GPT-OSS 120B CLOUD Estimated 70GB 78GB Dual RTX 4090 (model parallel) A100 80GB Open
GPT-OSS 120B FP16 Estimated 80GB 92GB Cloud-first (no practical single-GPU local) H100/H200 class Open
GPT-OSS 120B Q4 Estimated 68GB 78GB Dual RTX 4090 (model parallel) A100 80GB Open
GPT-OSS 120B Q5 Estimated 70GB 80GB Dual RTX 4090 (model parallel) A100 80GB Open
GPT-OSS 120B Q8 Estimated 74GB 84GB Cloud-first (no practical single-GPU local) H100/H200 class Open
GPT-OSS 20B CLOUD Estimated 20GB 28GB RTX 6000 Ada 48GB A100 80GB Open
GPT-OSS 20B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
GPT-OSS 20B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
GPT-OSS 20B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
GPT-OSS 20B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Back to all groups Use VRAM calculator Run large models on RunPod Try Vast.ai fallback

We may earn a commission if you click links on this page.