Qwen3.5 35B Q8

Popular Ollama model family: Qwen3.5. Caveat: Estimated values are placeholders unless marked measured..

Hardware Snapshot

Family Qwen3.5
Scenario reasoning
License scope open-source
Quantization Q8
VRAM minimum 44GB
VRAM optimal 54GB
Best local GPU Dual RTX 4090 (model parallel)
Cloud fallback A100 80GB
Updated 2026-02-24
Data status Verified by Real Hardware
Ollama source Library reference (verified: 2026-02-24)
Ollama tag qwen3.5:35b
Category reasoning

Benchmark Anchors

Hardware Expected tok/s
RTX 3090 24GB 4.9
RTX 4090 24GB 6.6
A100 80GB 11.8

Real Hardware Benchmark (RTX 3090)

Tokens/s 35.075
Latency 3585 ms
Prompt tokens 31
Eval tokens 96
Test time 2026-03-15T12:17:40Z
GPU model NVIDIA GeForce RTX 3090

Verified by real hardware.

View raw nvidia-smi snapshot

Performance Curve

Reference anchors are baseline estimates. Measured RTX 3090 data is overlaid when available.

Best Hardware for Qwen3.5 35B Q8

Local vs Cloud Cost Hint

Mode 40h / month 120h / month
Local power only (3090 baseline) $2.24 $6.72
A100 80GB $102 $306
ollama run qwen3.5:35b More reasoning models More 70b-class models Benchmark changelog Submit your test result Run on RunPod Try Vast.ai

We may earn a commission if you click links on this page.