Multimodal Models Hub

Use this hub to shortlist image-capable models by memory budget. Start with local-first options, then escalate to larger cloud-first profiles only when your tasks need heavier reasoning depth.

111

Total multimodal profiles

14

Local-first picks

10

Cloud-first picks

8

Size classes covered

Local-first multimodal picks (<=24GB optimal)

Model VRAM min/optimal 3090 tok/s Detail
LLaVA 7B Q4 8GB / 10GB 30 Open
Gemma 3 270M Q4 2GB / 10GB 48 Open
LLaVA 7B Q5 10GB / 12GB 27 Open
Qwen3 VL 2B Q4 2GB / 12GB 42 Open
Qwen3 VL 2B CLOUD 4GB / 12GB 42 Open
Gemma 3n E2B Q4 2GB / 12GB 42 Open
Gemma 3 1B Q4 2GB / 12GB 48 Open
Gemma 3 270M Q5 2GB / 12GB 43.2 Open
Llama 3.2 Vision 11B Q4 12GB / 14GB 21 Open
Gemma 3 4B Q4 4GB / 14GB 36 Open
Qwen3 VL 4B Q4 4GB / 14GB 36 Open
Qwen3 VL 4B CLOUD 6GB / 14GB 36 Open
Gemma 3n E4B Q4 4GB / 14GB 36 Open
Qwen2.5 VL 3B Q4 4GB / 14GB 36 Open

Cloud-first multimodal picks (>24GB optimal)

Model VRAM min/optimal Cloud fallback Detail
Llama 4 128X17B Q4 418GB / 428GB H100/H200 class Open
Llama 4 128X17B Q5 420GB / 430GB H100/H200 class Open
Llama 4 128X17B Q8 424GB / 434GB H100/H200 class Open
Llama 4 128X17B FP16 430GB / 442GB H100/H200 class Open
Qwen3.5 397B-A17B CLOUD 215GB / 223GB H100/H200 class Open
Llama 4 16X17B Q4 213GB / 223GB H100/H200 class Open
Llama 4 16X17B Q5 215GB / 225GB H100/H200 class Open
Llama 4 16X17B Q8 219GB / 229GB H100/H200 class Open
Llama 4 16X17B FP16 225GB / 237GB H100/H200 class Open
Qwen3 VL 235B Q4 138GB / 148GB H100/H200 class Open
Open vision guide Open multimodal group Estimate VRAM