Llama 4 16X17B Q5

Popular Ollama model family: Llama 4. Caveat: Estimated values are placeholders unless marked measured..

Hardware Snapshot

Family Llama 4
Scenario multimodal
License scope closed-weight
Quantization Q5
VRAM minimum 215GB
VRAM optimal 225GB
Best local GPU Cloud-first (no practical single-GPU local)
Cloud fallback H100/H200 class
Updated 2026-02-24
Data status Verified by Real Hardware
Ollama source Library reference (verified: 2026-02-24)
Ollama tag llama4:16x17b
Category multimodal

Benchmark Anchors

Hardware Expected tok/s
RTX 3090 24GB 1
RTX 4090 24GB 1.4
A100 80GB 2.4

Real Hardware Benchmark (RTX 3090)

Tokens/s 7.604
Latency 9383 ms
Prompt tokens 363
Eval tokens 64
Test time 2026-04-01T11:53:50Z
GPU model NVIDIA GeForce RTX 3090

Verified by real hardware.

View raw nvidia-smi snapshot

Performance Curve

Reference anchors are baseline estimates. Measured RTX 3090 data is overlaid when available.

Best Hardware for Llama 4 16X17B Q5

Local vs Cloud Cost Hint

Mode 40h / month 120h / month
Local power only (3090 baseline) $2.24 $6.72
H100/H200 class $196 $588
ollama run llama4:16x17b More multimodal models More 250b-plus models Benchmark changelog Submit your test result Run on RunPod Try Vast.ai

We may earn a commission if you click links on this page.