Verified on RTX 3090

This page tracks model tags with measured local runs on RTX 3090. Use it when you want evidence-backed local performance instead of pure baseline estimates.

Measured model tags

Model Tag Tokens/s Latency (ms) Test time Detail
Qwen3 Coder 30B CLOUD qwen3-coder:30b 153.4 961 2026-04-01T11:53:50Z Open
Qwen3 8B FP16 qwen3:8b 125.7 1554 2026-04-01T11:53:50Z Open
Ministral 3 14B FP16 ministral-3:14b 82.7 2390 2026-04-01T11:53:50Z Open
DeepSeek-R1 14B FP16 deepseek-r1:14b 80.2 2027 2026-04-01T11:53:50Z Open
Qwen2.5 14B FP16 qwen2.5:14b 77.7 1072 2026-04-01T11:53:50Z Open
Nemotron 3 Nano 30B FP16 nemotron-3-nano:30b 57.0 2468 2026-04-01T11:53:50Z Open
Translategemma 27B FP16 translategemma:27b 41.3 3142 2026-04-01T11:53:50Z Open
Qwen2.5 Coder 32B FP16 qwen2.5-coder:32b 37.9 1521 2026-04-01T11:53:50Z Open
Qwen3.5 35B FP16 qwen3.5:35b 35.1 3585 2026-03-15T12:17:40Z Open
GPT-OSS 20B CLOUD gpt-oss:20b 29.6 4118 2026-04-01T11:53:50Z Open
Mistral Small 22B FP16 mistral-small:22b 16.7 5609 2026-04-01T11:53:50Z Open
Glm 4.7 Flash 7B FP16 glm-4.7-flash:bf16 11.2 9291 2026-03-04T09:01:38Z Open
Gemma 3 27B FP16 gemma3:27b 9.1 11605 2026-04-01T11:53:50Z Open
Llama 4 16X17B FP16 llama4:16x17b 7.6 9383 2026-04-01T11:53:50Z Open
QwQ 32B FP16 qwq:32b 6.6 15250 2026-04-01T11:53:50Z Open
Qwen3.5 122B FP16 qwen3.5:122b 4.9 11915 2026-02-26T19:19:16Z Open
Llama 3.3 70B FP16 llama3.3:70b 3.8 14959 2026-03-11T04:17:51Z Open

Validation notes

Open benchmark changelog Open model catalog Estimate VRAM