Last updated: 2026-04-01.
This page is updated whenever benchmark baselines change.
2026-04-01 - Weekly benchmark run (15 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.86 | Ollama 0.17.7
- qwen3:8b avg 125.651 tok/s, avg latency 1.554s (2 run(s))
- deepseek-r1:14b avg 80.156 tok/s, avg latency 2.027s (2 run(s))
- qwen2.5:14b avg 77.664 tok/s, avg latency 1.072s (2 run(s))
- qwen3-coder:30b avg 153.373 tok/s, avg latency 0.961s (2 run(s))
- qwen3.5:27b avg 34.937 tok/s, avg latency 3.503s (2 run(s))
- llama3.3:70b-instruct-q4_k_m avg 3.462 tok/s, avg latency 16.382s (2 run(s))
- qwen2.5-coder:32b avg 37.884 tok/s, avg latency 1.521s (2 run(s))
- ministral-3:14b avg 82.665 tok/s, avg latency 2.39s (2 run(s))
- gpt-oss:20b avg 29.585 tok/s, avg latency 4.118s (2 run(s))
- mistral-small:22b avg 16.674 tok/s, avg latency 5.609s (2 run(s))
- gemma3:27b avg 9.073 tok/s, avg latency 11.605s (2 run(s))
- llama4:latest avg 7.604 tok/s, avg latency 9.383s (2 run(s))
- qwq:32b avg 6.579 tok/s, avg latency 15.25s (2 run(s))
- translategemma:27b avg 41.293 tok/s, avg latency 3.142s (2 run(s))
- nemotron-3-nano:30b avg 57.048 tok/s, avg latency 2.468s (2 run(s))
2026-03-15 - Weekly benchmark run (15 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.7
- qwen3:8b avg 137.231 tok/s, avg latency 1.488s (2 run(s))
- deepseek-r1:14b avg 84.107 tok/s, avg latency 1.942s (2 run(s))
- qwen2.5:14b avg 85.32 tok/s, avg latency 0.999s (2 run(s))
- qwen3-coder:30b avg 160.041 tok/s, avg latency 0.983s (2 run(s))
- qwen3.5:27b avg 35.075 tok/s, avg latency 3.585s (2 run(s))
- llama3.3:70b-instruct-q4_k_m avg 3.753 tok/s, avg latency 15.154s (2 run(s))
- qwen2.5-coder:32b avg 41.117 tok/s, avg latency 1.422s (2 run(s))
- ministral-3:14b avg 89.264 tok/s, avg latency 2.087s (2 run(s))
- gpt-oss:20b avg 165.995 tok/s, avg latency 1.256s (2 run(s))
- mistral-small:22b avg 61.811 tok/s, avg latency 1.817s (2 run(s))
- gemma3:27b avg 43.901 tok/s, avg latency 3.005s (2 run(s))
- llama4:latest avg 10.261 tok/s, avg latency 7.113s (2 run(s))
- qwq:32b avg 40.316 tok/s, avg latency 2.808s (2 run(s))
- translategemma:27b avg 44.508 tok/s, avg latency 2.963s (2 run(s))
- nemotron-3-nano:30b avg 71.974 tok/s, avg latency 2.057s (2 run(s))
2026-03-11 - Weekly benchmark run (15 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.7
- qwen3:8b avg 133.856 tok/s, avg latency 1.505s (2 run(s))
- deepseek-r1:14b avg 78.036 tok/s, avg latency 2.085s (2 run(s))
- qwen2.5:14b avg 84.947 tok/s, avg latency 0.975s (2 run(s))
- qwen3-coder:30b avg 159.939 tok/s, avg latency 0.999s (2 run(s))
- qwen3.5:35b avg 48.862 tok/s, avg latency 2.779s (2 run(s))
- llama3.3:70b-instruct-q4_k_m avg 3.795 tok/s, avg latency 14.959s (2 run(s))
- qwen2.5-coder:32b avg 40.501 tok/s, avg latency 1.427s (2 run(s))
- ministral-3:14b avg 88.299 tok/s, avg latency 2.128s (2 run(s))
- gpt-oss:20b avg 164.561 tok/s, avg latency 1.261s (2 run(s))
- mistral-small:22b avg 61.266 tok/s, avg latency 1.832s (2 run(s))
- gemma3:27b avg 44.105 tok/s, avg latency 2.985s (2 run(s))
- llama4:latest avg 9.922 tok/s, avg latency 7.352s (2 run(s))
- qwq:32b avg 40.228 tok/s, avg latency 2.816s (2 run(s))
- translategemma:27b avg 44.392 tok/s, avg latency 2.977s (2 run(s))
- nemotron-3-nano:30b avg 64.382 tok/s, avg latency 2.279s (2 run(s))
2026-03-10 - Weekly benchmark run (15 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.7
- qwen3:8b avg 123.869 tok/s, avg latency 1.605s (2 run(s))
- deepseek-r1:14b avg 75.992 tok/s, avg latency 2.129s (2 run(s))
- qwen2.5:14b avg 74.701 tok/s, avg latency 1.087s (2 run(s))
- qwen3-coder:30b avg 145.65 tok/s, avg latency 1.052s (2 run(s))
- qwen3.5:27b avg 34.516 tok/s, avg latency 3.586s (2 run(s))
- llama3.3:70b-instruct-q4_k_m avg 3.542 tok/s, avg latency 16.02s (2 run(s))
- qwen2.5-coder:32b avg 37.898 tok/s, avg latency 1.515s (2 run(s))
- ministral-3:14b avg 81.981 tok/s, avg latency 2.238s (2 run(s))
- gpt-oss:20b avg 155.293 tok/s, avg latency 1.337s (2 run(s))
- mistral-small:22b avg 57.675 tok/s, avg latency 1.929s (2 run(s))
- gemma3:27b avg 41.481 tok/s, avg latency 3.173s (2 run(s))
- llama4:latest avg 9.421 tok/s, avg latency 7.736s (2 run(s))
- qwq:32b avg 37.512 tok/s, avg latency 3.039s (2 run(s))
- translategemma:27b avg 41.227 tok/s, avg latency 3.158s (2 run(s))
- nemotron-3-nano:30b avg 63.069 tok/s, avg latency 2.354s (2 run(s))
2026-03-04 - Weekly benchmark run (15 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.1
- qwen3:8b avg 134.362 tok/s, avg latency 1.364s (2 run(s))
- deepseek-r1:14b avg 82.558 tok/s, avg latency 2.113s (2 run(s))
- qwen2.5:14b avg 76.749 tok/s, avg latency 1.31s (2 run(s))
- qwen3-coder:30b avg 155.539 tok/s, avg latency 1.058s (2 run(s))
- qwen3.5:27b avg 37.644 tok/s, avg latency 3.294s (2 run(s))
- llama3.3:70b-instruct-q4_k_m avg 3.317 tok/s, avg latency 16.877s (2 run(s))
- qwen2.5-coder:32b avg 40.578 tok/s, avg latency 1.703s (2 run(s))
- ministral-3:14b avg 87.978 tok/s, avg latency 2.092s (2 run(s))
- gpt-oss:20b avg 162.802 tok/s, avg latency 1.292s (2 run(s))
- mistral-small:22b avg 61.806 tok/s, avg latency 1.915s (2 run(s))
- gemma3:27b avg 43.9 tok/s, avg latency 2.973s (2 run(s))
- llama4:latest avg 8.649 tok/s, avg latency 8.465s (2 run(s))
- qwq:32b avg 39.563 tok/s, avg latency 2.994s (2 run(s))
- translategemma:27b avg 43.197 tok/s, avg latency 2.941s (2 run(s))
- glm-4.7-flash:bf16 avg 11.236 tok/s, avg latency 9.291s (2 run(s))
2026-02-28 - Weekly benchmark run (8 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.1
- qwen3:8b avg 127.768 tok/s, avg latency 1.456s (2 run(s))
- deepseek-r1:14b avg 80.197 tok/s, avg latency 2.057s (2 run(s))
- qwen2.5:14b avg 81.419 tok/s, avg latency 1.066s (2 run(s))
- qwen3-coder:30b avg 146.471 tok/s, avg latency 1.049s (2 run(s))
- qwen3.5:35b avg 44.616 tok/s, avg latency 2.863s (2 run(s))
- llama3.3:70b-instruct-q4_k_m avg 3.425 tok/s, avg latency 16.199s (2 run(s))
- qwen2.5-coder:32b avg 39.517 tok/s, avg latency 1.592s (2 run(s))
- ministral-3:14b avg 84.121 tok/s, avg latency 2.078s (2 run(s))
2026-02-26 - Weekly benchmark run (10 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.1
- qwen3:8b avg 120.308 tok/s, avg latency 1.541s (2 run(s))
- deepseek-r1:14b avg 75.207 tok/s, avg latency 2.193s (2 run(s))
- qwen2.5:14b avg 75.589 tok/s, avg latency 1.181s (2 run(s))
- qwen3-coder:30b avg 146.34 tok/s, avg latency 0.956s (2 run(s))
- qwen3.5:35b avg 35.885 tok/s, avg latency 3.398s (2 run(s))
- llama3.3:70b-instruct-q4_k_m avg 3.264 tok/s, avg latency 17.04s (2 run(s))
- qwen2.5-coder:32b avg 36.705 tok/s, avg latency 1.74s (2 run(s))
- ministral-3:14b avg 78.304 tok/s, avg latency 2.174s (2 run(s))
- llama3.3:70b-instruct-q4_k_m avg 3.287 tok/s, avg latency 16.933s (2 run(s))
- qwen3.5:122b avg 4.931 tok/s, avg latency 11.915s (2 run(s))
2026-02-26 - Weekly benchmark run (9 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.1
- qwen3:8b avg 107.883 tok/s, avg latency 1.647s (2 run(s))
- deepseek-r1:14b avg 73.173 tok/s, avg latency 2.241s (2 run(s))
- qwen2.5:14b avg 72.306 tok/s, avg latency 1.179s (2 run(s))
- qwen3-coder:30b avg 137.942 tok/s, avg latency 1.054s (2 run(s))
- qwen3.5:35b avg 37.205 tok/s, avg latency 3.314s (2 run(s))
- llama3.3:70b-instruct-q4_k_m avg 3.097 tok/s, avg latency 17.93s (2 run(s))
- qwen2.5-coder:32b avg 35.103 tok/s, avg latency 1.813s (2 run(s))
- ministral-3:14b avg 76.292 tok/s, avg latency 2.241s (2 run(s))
- llama3.3:70b-instruct-q4_k_m avg 3.099 tok/s, avg latency 17.93s (2 run(s))
- Skipped/failed targets: qwen3.5:122b
2026-02-26 - Weekly benchmark run (9 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.1
- qwen3:8b avg 135.395 tok/s, avg latency 1.335s (2 run(s))
- deepseek-r1:14b avg 83.511 tok/s, avg latency 1.975s (2 run(s))
- qwen2.5:14b avg 82.662 tok/s, avg latency 1.064s (2 run(s))
- qwen3-coder:30b avg 156.868 tok/s, avg latency 0.916s (2 run(s))
- qwen3.5:35b avg 39.607 tok/s, avg latency 3.096s (2 run(s))
- llama3.3:70b-instruct-q4_k_m avg 3.679 tok/s, avg latency 15.118s (2 run(s))
- qwen2.5-coder:32b avg 41.156 tok/s, avg latency 1.551s (2 run(s))
- ministral-3:14b avg 89.221 tok/s, avg latency 1.925s (2 run(s))
- llama3.3:70b-instruct-q4_k_m avg 3.339 tok/s, avg latency 16.654s (2 run(s))
- Skipped/failed targets: qwen3.5:122b
2026-02-26 - Weekly benchmark run (8 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.1
- qwen3:8b avg 134.19 tok/s, avg latency 1.45s (2 run(s))
- deepseek-r1:14b avg 80.727 tok/s, avg latency 2.152s (2 run(s))
- qwen2.5:14b avg 84.519 tok/s, avg latency 1.144s (2 run(s))
- qwen3-coder:30b avg 158.341 tok/s, avg latency 0.995s (2 run(s))
- qwen3.5:35b avg 43.642 tok/s, avg latency 2.984s (2 run(s))
- llama3.3:70b-instruct-q4_k_m avg 3.534 tok/s, avg latency 15.872s (2 run(s))
- qwen2.5-coder:32b avg 40.172 tok/s, avg latency 1.679s (2 run(s))
- ministral-3:14b avg 88.031 tok/s, avg latency 2.075s (2 run(s))
2026-02-26 - Weekly benchmark run (7 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.0
- qwen3:8b avg 127.626 tok/s, avg latency 1.454s (2 run(s))
- deepseek-r1:14b avg 83.486 tok/s, avg latency 2.046s (2 run(s))
- qwen2.5:14b avg 85.259 tok/s, avg latency 1.02s (2 run(s))
- qwen3-coder:30b avg 160.271 tok/s, avg latency 0.913s (2 run(s))
- llama3.3:70b-instruct-q4_k_m avg 3.611 tok/s, avg latency 15.433s (2 run(s))
- qwen2.5-coder:32b avg 38.646 tok/s, avg latency 1.658s (2 run(s))
- ministral-3:14b avg 84.704 tok/s, avg latency 2.103s (2 run(s))
- Skipped/failed targets: qwen3.5:35b
2026-02-25 - Weekly benchmark run (4 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.0
- qwen3:8b avg 125.76 tok/s, avg latency 1.124s (2 run(s))
- deepseek-r1:14b avg 76.564 tok/s, avg latency 1.81s (2 run(s))
- qwen2.5:14b avg 77.17 tok/s, avg latency 0.791s (2 run(s))
- qwen3-coder:30b avg 149.66 tok/s, avg latency 0.638s (2 run(s))
- Skipped/failed targets: qwen3.5:35b
2026-02-25 - Weekly benchmark run (4 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.0
- qwen3:8b avg 122.331 tok/s, avg latency 1.144s (2 run(s))
- deepseek-r1:14b avg 76.727 tok/s, avg latency 1.809s (2 run(s))
- qwen2.5:14b avg 78.328 tok/s, avg latency 0.782s (2 run(s))
- qwen3-coder:30b avg 149.0 tok/s, avg latency 0.633s (2 run(s))
- Skipped/failed targets: qwen3.5:35b
2026-02-25 - Weekly benchmark run (4 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.0
- qwen3:8b avg 126.637 tok/s, avg latency 1.116s (2 run(s))
- deepseek-r1:14b avg 77.514 tok/s, avg latency 1.788s (2 run(s))
- qwen2.5:14b avg 77.945 tok/s, avg latency 0.78s (2 run(s))
- qwen3-coder:30b avg 148.375 tok/s, avg latency 0.632s (2 run(s))
- Skipped/failed targets: qwen3.5:35b
2026-02-25 - Weekly benchmark run (4 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.0
- qwen3:8b avg 126.856 tok/s, avg latency 1.112s (2 run(s))
- deepseek-r1:14b avg 77.219 tok/s, avg latency 1.795s (2 run(s))
- qwen2.5:14b avg 78.059 tok/s, avg latency 0.784s (2 run(s))
- qwen3-coder:30b avg 149.915 tok/s, avg latency 0.631s (2 run(s))
- Skipped/failed targets: qwen3.5:35b
2026-02-25 - Weekly benchmark run (4 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.0
- qwen3:8b avg 126.388 tok/s, avg latency 1.116s (2 run(s))
- deepseek-r1:14b avg 77.147 tok/s, avg latency 1.799s (2 run(s))
- qwen2.5:14b avg 77.858 tok/s, avg latency 0.784s (2 run(s))
- qwen3-coder:30b avg 148.992 tok/s, avg latency 0.631s (2 run(s))
- Skipped/failed targets: qwen3.5:35b
2026-02-25 - Weekly benchmark run (4 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.0
- qwen3:8b avg 125.142 tok/s, avg latency 1.129s (2 run(s))
- deepseek-r1:14b avg 76.665 tok/s, avg latency 1.816s (2 run(s))
- qwen2.5:14b avg 77.367 tok/s, avg latency 0.799s (2 run(s))
- qwen3-coder:30b avg 147.144 tok/s, avg latency 0.643s (2 run(s))
2026-02-25 - Weekly benchmark run (4 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.17.0
- qwen3:8b avg 126.44 tok/s, avg latency 1.119s (2 run(s))
- deepseek-r1:14b avg 75.704 tok/s, avg latency 1.83s (2 run(s))
- qwen2.5:14b avg 78.324 tok/s, avg latency 0.789s (2 run(s))
- qwen3-coder:30b avg 146.745 tok/s, avg latency 0.64s (2 run(s))
2026-02-24 - Weekly benchmark run (1 successful model(s))
NVIDIA GeForce RTX 3090 | driver 591.74 | Ollama 0.15.6
- qwen3:8b avg 122.951 tok/s, avg latency 1.146s (2 run(s))
- Skipped/failed targets: llama3:8b, deepseek-r1:8b
2026-02-24 - Llama 70B Q4 sustained run updated
RTX 3090 | CUDA 12.4 | Ollama 0.5.7
- 1-hour sustained run completed with updated thermal profile
- Token throughput increased by 12% versus previous baseline
- OOM threshold at 16K context reconfirmed
2026-02-23 - Qwen 32B Q5 local-vs-cloud cost refresh
RTX 3090 + RunPod A6000
- ROI table recalculated with latest cloud hourly pricing
- Cloud fallback CTA switched for >32GB optimal VRAM profiles
2026-02-22 - DeepSeek-R1 quantization benchmark batch
RTX 3090 | WSL2 | Ollama
- Added Q4/Q5/Q8 comparison
- Updated quality notes for long-context prompts