Verified on RTX 3090

This page tracks model tags with measured local runs on RTX 3090. Use it when you want evidence-backed local performance instead of pure baseline estimates.

Measured model tags

Model Tag Tokens/s Latency (ms) Test time Detail
GPT-OSS 20B CLOUD gpt-oss:20b 156.1 1524 2026-04-29T05:39:58Z Open
Qwen3 Coder 30B CLOUD qwen3-coder:30b 140.5 935 2026-06-17T07:31:11Z Open
Qwen3 8B FP16 qwen3:8b 121.7 1429 2026-06-17T07:31:11Z Open
Qwen2.5 Coder 32B FP16 qwen2.5-coder:32b 92.2 1609 2026-06-17T07:31:11Z Open
Qwen2.5 14B FP16 qwen2.5:14b 84.0 946 2026-04-29T05:39:58Z Open
Ministral 3 14B FP16 ministral-3:14b 79.2 2047 2026-06-17T07:31:11Z Open
DeepSeek-R1 14B FP16 deepseek-r1:14b 74.4 2119 2026-06-17T07:31:11Z Open
Nemotron 3 Nano 30B FP16 nemotron-3-nano:30b 57.0 2468 2026-04-01T11:53:50Z Open
Mistral Small 22B FP16 mistral-small:22b 54.8 1937 2026-06-17T07:31:11Z Open
Qwen3.6 35B FP16 qwen3.6:35b 47.8 2628 2026-06-17T07:31:11Z Open
Translategemma 27B FP16 translategemma:27b 41.3 3142 2026-04-01T11:53:50Z Open
Gemma 3 27B FP16 gemma3:27b 39.5 3136 2026-06-17T07:31:11Z Open
QwQ 32B FP16 qwq:32b 35.8 3032 2026-06-17T07:31:11Z Open
Qwen3.5 35B FP16 qwen3.5:35b 33.8 3424 2026-06-17T07:31:11Z Open
Glm 4.7 Flash 7B FP16 glm-4.7-flash:bf16 11.2 9291 2026-03-04T09:01:38Z Open
Llama 4 16X17B FP16 llama4:16x17b 9.1 7779 2026-06-10T06:45:58Z Open
Qwen3.5 122B FP16 qwen3.5:122b 4.9 11915 2026-02-26T19:19:16Z Open
Llama 3.3 70B FP16 llama3.3:70b 3.5 15676 2026-04-29T05:39:58Z Open

Validation notes

Open benchmark changelog Open model catalog Estimate VRAM