Qwen3.5 122B Q8

Popular Ollama model family: Qwen3.5. Caveat: Estimated values are placeholders unless marked measured..

Hardware Snapshot

Family	Qwen3.5
Scenario	reasoning
License scope	open-source
Quantization	Q8
VRAM minimum	144GB
VRAM optimal	154GB
Best local GPU	Cloud-first (no practical single-GPU local)
Cloud fallback	H100/H200 class
Updated	2026-02-24
Data status	Verified by Real Hardware
Ollama source	Library reference (verified: 2026-02-24)
Ollama tag	`qwen3.5:122b`
Category	reasoning

Verified by real hardware.

Reference anchors are baseline estimates. Measured RTX 3090 data is overlaid when available.

Local run: RTX 3090 (24GB) (Check latest deal) for around 4.931 tok/s on this profile.
Cloud run: RunPod H100/H200 class , about 0.7x the local 3090 speed anchor.
Alternative cloud: Vast.ai options for flexible spot pricing.

Mode	40h / month	120h / month
Local power only (3090 baseline)	$2.24	$6.72
H100/H200 class	$196	$588

We may earn a commission if you click links on this page.