Open-Weight Models

10 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.

Model	Data	VRAM min	VRAM optimal	Best local GPU	Cloud fallback	Detail
GPT-OSS 120B CLOUD	Estimated	70GB	78GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
GPT-OSS 120B FP16	Estimated	80GB	92GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
GPT-OSS 120B Q4	Estimated	68GB	78GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
GPT-OSS 120B Q5	Estimated	70GB	80GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
GPT-OSS 120B Q8	Estimated	74GB	84GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
GPT-OSS 20B CLOUD	Estimated	20GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
GPT-OSS 20B FP16	Estimated	30GB	42GB	RTX 6000 Ada 48GB	A100 80GB	Open
GPT-OSS 20B Q4	Estimated	18GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
GPT-OSS 20B Q5	Estimated	20GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
GPT-OSS 20B Q8	Estimated	24GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open

We may earn a commission if you click links on this page.