Today's Local LLM Pick: qwen3-coder:30b on RTX 3090 (2026)

Daily 3090 recommendation for qwen3-coder:30b: fast performer at 144.7 tok/s, RTX 3090 benchmark data, use-case fit, and local-vs-cloud decision guide.

Published: 2026-06-15 Updated: 2026-06-15 Intent: benchmark

Fast verdict

qwen3-coder:30b is a fast coding model on a 24GB RTX 3090 (144.7 tok/s). If it fits your VRAM with headroom for your target context length, it is a strong candidate for daily local use. Download it and validate on your own prompts — the numbers suggest it will handle interactive workloads comfortably.

qwen3-coder:30b approaches the 24GB boundary at higher quantizations. Consider Q4 or Q5 if you need context headroom on the RTX 3090. It ranks #2 of 18 in throughput among currently measured models on this RTX 3090. The next faster model is gpt-oss:20b (156.1 tok/s, 8% faster). The next slower model is qwen3:8b (124.6 tok/s, 16% slower).

The daily goal is simple: help a 3090 owner decide what to download tonight, what to skip, and when a cloud fallback is the better use of time.

Today’s pick

  • Model: qwen3-coder:30b
  • Category: coding
  • Size tier: large
  • Performance tier: fast
  • RTX 3090 speed: 144.7 tok/s
  • Latency: 936 ms
  • Test time: 2026-06-10T06:45:58Z
  • Baseline command:
ollama run qwen3-coder:30b

Who should try it

  • Developers evaluating qwen3-coder:30b for code completion, refactoring, or agentic coding on a local RTX 3090.
  • Teams that want a private, offline coding assistant without sending source code to a cloud API.
  • Anyone comparing {pick_tag} against Copilot or cloud coding agents on latency and throughput.

Who should skip it

  • Users whose primary workload is long-context chat or document analysis rather than code.
  • Teams that need guaranteed performance on a specific programming language; test with your own benchmark first.
  • 8GB/12GB GPU owners unless a smaller quantized variant is available.

Watch points

  • Output quality varies by language: test qwen3-coder:30b on your primary language before depending on it.
  • Temperature sensitivity: coding tasks usually perform best at temperature 0; higher values may introduce errors.
  • Context window: verify the model keeps instruction adherence stable at the context length you need.

Verified benchmark anchors

  • gpt-oss:20b: 156.1 tok/s | latency 1524 ms | test 2026-04-29T05:39:58Z
  • qwen3-coder:30b: 144.7 tok/s | latency 936 ms | test 2026-06-10T06:45:58Z
  • qwen3:8b: 124.6 tok/s | latency 1389 ms | test 2026-06-10T06:45:58Z
  • qwen2.5:14b: 84.0 tok/s | latency 946 ms | test 2026-04-29T05:39:58Z
  • ministral-3:14b: 82.0 tok/s | latency 1960 ms | test 2026-06-10T06:45:58Z

RTX 3090 decision guide

  1. VRAM check first: if qwen3-coder:30b fits with headroom at your target context length, run it locally.
  2. Latency validation: verify p95 latency matches your workload requirements under realistic concurrency.
  3. Cloud only for bursts: keep local as the default; use cloud rental for peak overflow or batch jobs.
  4. New release watch: if a newer version of qwen3-coder:30b drops, re-test within 48 hours to capture the traffic window.

Comparisons to validate

  • qwen3-coder:30b vs the next-fastest and next-slowest model in the benchmark feed.
  • qwen3-coder:30b vs gpt-oss:20b — same size tier, 145 vs 156 tok/s.
  • qwen3-coder:30b local power cost vs A100 rental for the same workload.

Next actions

  • Estimate VRAM fit: /en/tools/vram-calculator/
  • Model page: /en/models/qwen3-coder-30b-q4/
  • Benchmark changelog: /en/benchmarks/changelog/
  • Local hardware path: /en/affiliate/hardware-upgrade/
  • Cloud fallback: /go/runpod and /go/vast

Affiliate Disclosure: This post may include affiliate links. LocalVRAM may earn a commission at no extra cost.

Check model fit Open Error KB View latest verified data