Multimodal Vision Guide
Pick vision-capable models by memory ceiling first. This keeps deployment stable and prevents over-sizing before you have evidence that a heavier profile is required.
Starter tier (<=12GB optimal)
| Model | VRAM min/optimal | Detail |
|---|---|---|
| LLaVA 7B Q4 | 8GB / 10GB | Open |
| Gemma 3 270M Q4 | 2GB / 10GB | Open |
| LLaVA 7B Q5 | 10GB / 12GB | Open |
| Qwen3 VL 2B Q4 | 2GB / 12GB | Open |
| Qwen3 VL 2B CLOUD | 4GB / 12GB | Open |
| Gemma 3n E2B Q4 | 2GB / 12GB | Open |
Balanced tier (13GB-24GB optimal)
| Model | VRAM min/optimal | Detail |
|---|---|---|
| Llama 3.2 Vision 11B Q4 | 12GB / 14GB | Open |
| Gemma 3 4B Q4 | 4GB / 14GB | Open |
| Qwen3 VL 4B Q4 | 4GB / 14GB | Open |
| Qwen3 VL 4B CLOUD | 6GB / 14GB | Open |
| Gemma 3n E4B Q4 | 4GB / 14GB | Open |
| Qwen2.5 VL 3B Q4 | 4GB / 14GB | Open |
| Qwen3 VL 2B Q5 | 4GB / 14GB | Open |
| Gemma 3n E2B Q5 | 4GB / 14GB | Open |
Heavy tier (>24GB optimal)
| Model | VRAM min/optimal | Cloud fallback | Detail |
|---|---|---|---|
| Llama 4 128X17B Q4 | 418GB / 428GB | H100/H200 class | Open |
| Llama 4 128X17B Q5 | 420GB / 430GB | H100/H200 class | Open |
| Llama 4 128X17B Q8 | 424GB / 434GB | H100/H200 class | Open |
| Llama 4 128X17B FP16 | 430GB / 442GB | H100/H200 class | Open |
| Qwen3.5 397B-A17B CLOUD | 215GB / 223GB | H100/H200 class | Open |
| Llama 4 16X17B Q4 | 213GB / 223GB | H100/H200 class | Open |
| Llama 4 16X17B Q5 | 215GB / 225GB | H100/H200 class | Open |
| Llama 4 16X17B Q8 | 219GB / 229GB | H100/H200 class | Open |