Llama 4 16X17B Q5

Popular Ollama model family: Llama 4. Caveat: Estimated values are placeholders unless marked measured..

Hardware Snapshot

Family	Llama 4
Scenario	multimodal
License scope	closed-weight
Quantization	Q5
VRAM minimum	215GB
VRAM optimal	225GB
Best local GPU	Cloud-first (no practical single-GPU local)
Cloud fallback	H100/H200 class
Updated	2026-02-24
Data status	Verified by Real Hardware
Ollama source	Library reference (verified: 2026-02-24)
Ollama tag	`llama4:16x17b`
Category	multimodal

Verified by real hardware.

Reference anchors are baseline estimates. Measured RTX 3090 data is overlaid when available.

Local run: RTX 3090 (24GB) (Check latest deal) for around 7.604 tok/s on this profile.
Cloud run: RunPod H100/H200 class , about 0.3x the local 3090 speed anchor.
Alternative cloud: Vast.ai options for flexible spot pricing.

Mode	40h / month	120h / month
Local power only (3090 baseline)	$2.24	$6.72
H100/H200 class	$196	$588

We may earn a commission if you click links on this page.