Verified on RTX 3090

This page tracks model tags with measured local runs on RTX 3090. Use it when you want evidence-backed local performance instead of pure baseline estimates.

Measured model tags

Model	Tag	Tokens/s	Latency (ms)	Test time	Detail
Qwen3 Coder 30B CLOUD	`qwen3-coder:30b`	153.4	961	2026-04-01T11:53:50Z	Open
Qwen3 8B FP16	`qwen3:8b`	125.7	1554	2026-04-01T11:53:50Z	Open
Ministral 3 14B FP16	`ministral-3:14b`	82.7	2390	2026-04-01T11:53:50Z	Open
DeepSeek-R1 14B FP16	`deepseek-r1:14b`	80.2	2027	2026-04-01T11:53:50Z	Open
Qwen2.5 14B FP16	`qwen2.5:14b`	77.7	1072	2026-04-01T11:53:50Z	Open
Nemotron 3 Nano 30B FP16	`nemotron-3-nano:30b`	57.0	2468	2026-04-01T11:53:50Z	Open
Translategemma 27B FP16	`translategemma:27b`	41.3	3142	2026-04-01T11:53:50Z	Open
Qwen2.5 Coder 32B FP16	`qwen2.5-coder:32b`	37.9	1521	2026-04-01T11:53:50Z	Open
Qwen3.5 35B FP16	`qwen3.5:35b`	35.1	3585	2026-03-15T12:17:40Z	Open
GPT-OSS 20B CLOUD	`gpt-oss:20b`	29.6	4118	2026-04-01T11:53:50Z	Open
Mistral Small 22B FP16	`mistral-small:22b`	16.7	5609	2026-04-01T11:53:50Z	Open
Glm 4.7 Flash 7B FP16	`glm-4.7-flash:bf16`	11.2	9291	2026-03-04T09:01:38Z	Open
Gemma 3 27B FP16	`gemma3:27b`	9.1	11605	2026-04-01T11:53:50Z	Open
Llama 4 16X17B FP16	`llama4:16x17b`	7.6	9383	2026-04-01T11:53:50Z	Open
QwQ 32B FP16	`qwq:32b`	6.6	15250	2026-04-01T11:53:50Z	Open
Qwen3.5 122B FP16	`qwen3.5:122b`	4.9	11915	2026-02-26T19:19:16Z	Open
Llama 3.3 70B FP16	`llama3.3:70b`	3.8	14959	2026-03-11T04:17:51Z	Open

Validation notes

Measured rows come from actual benchmark snapshots, not template placeholders.
Catalog baseline values stay visible on model pages for comparison.
For heavy profiles, use cloud fallback links after local saturation.

Open benchmark changelog Open model catalog Estimate VRAM