Qwen3.5 35B Q8

Popular Ollama model family: Qwen3.5. Caveat: Estimated values are placeholders unless marked measured..

Hardware Snapshot

Family	Qwen3.5
Scenario	reasoning
License scope	open-source
Quantization	Q8
VRAM minimum	44GB
VRAM optimal	54GB
Best local GPU	Dual RTX 4090 (model parallel)
Cloud fallback	A100 80GB
Updated	2026-02-24
Data status	Verified by Real Hardware
Ollama source	Library reference (verified: 2026-02-24)
Ollama tag	`qwen3.5:35b`
Category	reasoning

Benchmark Anchors

Hardware	Expected tok/s
RTX 3090 24GB	4.9
RTX 4090 24GB	6.6
A100 80GB	11.8

Real Hardware Benchmark (RTX 3090)

Tokens/s	33.822
Latency	3424 ms
Prompt tokens	31
Eval tokens	96
Test time	2026-06-17T07:31:11Z
GPU model	NVIDIA GeForce RTX 3090

Verified by real hardware.

View raw nvidia-smi snapshot

Performance Curve

Reference anchors are baseline estimates. Measured RTX 3090 data is overlaid when available.

Best Hardware for Qwen3.5 35B Q8

Local run: RTX 3090 (24GB) (Check latest deal) for around 33.822 tok/s on this profile.
Cloud run: RunPod A100 80GB , about 0.3x the local 3090 speed anchor.
Alternative cloud: Vast.ai options for flexible spot pricing.

Local vs Cloud Cost Hint

Mode	40h / month	120h / month
Local power only (3090 baseline)	$2.24	$6.72
A100 80GB	$102	$306

Related Model Profiles

Qwen3.5 35B Q4 38GB min, 48GB optimal
Qwen3.5 35B Q5 40GB min, 50GB optimal
Qwen3.5 35B FP16 50GB min, 62GB optimal

ollama run qwen3.5:35b More reasoning models More 70b-class models Benchmark changelog Submit your test result Run on RunPod Try Vast.ai

We may earn a commission if you click links on this page.