Multimodal Models

111 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.

Model	Data	VRAM min	VRAM optimal	Best local GPU	Cloud fallback	Detail
Llama 4 128X17B FP16	Estimated	430GB	442GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Llama 4 128X17B Q4	Estimated	418GB	428GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Llama 4 128X17B Q5	Estimated	420GB	430GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Llama 4 128X17B Q8	Estimated	424GB	434GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Qwen3.5 397B-A17B CLOUD	Estimated	215GB	223GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Llama 4 16X17B FP16	Estimated	225GB	237GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Llama 4 16X17B Q4	Estimated	213GB	223GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Llama 4 16X17B Q5	Estimated	215GB	225GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Llama 4 16X17B Q8	Estimated	219GB	229GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Qwen3 VL 235B CLOUD	Estimated	140GB	148GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Qwen3 VL 235B FP16	Estimated	150GB	162GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Qwen3 VL 235B Q4	Estimated	138GB	148GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Qwen3 VL 235B Q5	Estimated	140GB	150GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Qwen3 VL 235B Q8	Estimated	144GB	154GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Llama 3.2 Vision 90B FP16	Estimated	80GB	92GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Llama 3.2 Vision 90B Q4	Estimated	68GB	78GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Llama 3.2 Vision 90B Q5	Estimated	70GB	80GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Llama 3.2 Vision 90B Q8	Estimated	74GB	84GB	Cloud-first (no practical single-GPU local)	H100/H200 class	Open
Qwen2.5 VL 72B FP16	Estimated	50GB	62GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen2.5 VL 72B Q4	Estimated	38GB	48GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen2.5 VL 72B Q5	Estimated	40GB	50GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
Qwen2.5 VL 72B Q8	Estimated	44GB	54GB	Dual RTX 4090 (model parallel)	A100 80GB	Open
LLaVA 34B FP16	Estimated	30GB	42GB	RTX 6000 Ada 48GB	A100 80GB	Open
LLaVA 34B Q4	Estimated	18GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
LLaVA 34B Q5	Estimated	20GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
LLaVA 34B Q8	Estimated	24GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen2.5 VL 32B FP16	Estimated	30GB	42GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen2.5 VL 32B Q4	Estimated	18GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen2.5 VL 32B Q5	Estimated	20GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen2.5 VL 32B Q8	Estimated	24GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen3 VL 32B CLOUD	Estimated	20GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen3 VL 32B FP16	Estimated	30GB	42GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen3 VL 32B Q4	Estimated	18GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen3 VL 32B Q5	Estimated	20GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen3 VL 32B Q8	Estimated	24GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen3 VL 30B CLOUD	Estimated	20GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen3 VL 30B FP16	Estimated	30GB	42GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen3 VL 30B Q4	Estimated	18GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen3 VL 30B Q5	Estimated	20GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen3 VL 30B Q8	Estimated	24GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
Gemma 3 27B FP16	Estimated	30GB	42GB	RTX 6000 Ada 48GB	A100 80GB	Open
Gemma 3 27B Q4	Estimated	18GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
Gemma 3 27B Q5	Estimated	20GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
Gemma 3 27B Q8	Estimated	24GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
LLaVA 13B FP16	Estimated	22GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
LLaVA 13B Q4	Estimated	10GB	20GB	RTX 3090 24GB	A6000 48GB	Open
LLaVA 13B Q5	Estimated	12GB	22GB	RTX 3090 24GB	A6000 48GB	Open
LLaVA 13B Q8	Estimated	16GB	26GB	RTX 6000 Ada 48GB	A100 80GB	Open
Gemma 3 12B FP16	Estimated	22GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
Gemma 3 12B Q4	Estimated	10GB	20GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3 12B Q5	Estimated	12GB	22GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3 12B Q8	Estimated	16GB	26GB	RTX 6000 Ada 48GB	A100 80GB	Open
Llama 3.2 Vision 11B FP16	Estimated	22GB	34GB	RTX 6000 Ada 48GB	A100 80GB	Open
Llama 3.2 Vision 11B Q4	Estimated	12GB	14GB	RTX 3090 24GB	A6000 48GB	Open
Llama 3.2 Vision 11B Q5	Estimated	14GB	16GB	RTX 3090 24GB	A6000 48GB	Open
Llama 3.2 Vision 11B Q8	Estimated	16GB	26GB	RTX 6000 Ada 48GB	A100 80GB	Open
LLaVA Llama3 8B FP16	Estimated	18GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
LLaVA Llama3 8B Q4	Estimated	6GB	16GB	RTX 3090 24GB	A6000 48GB	Open
LLaVA Llama3 8B Q5	Estimated	8GB	18GB	RTX 3090 24GB	A6000 48GB	Open
LLaVA Llama3 8B Q8	Estimated	12GB	22GB	RTX 3090 24GB	A6000 48GB	Open
MiniCPM-V 8B FP16	Estimated	18GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
MiniCPM-V 8B Q4	Estimated	6GB	16GB	RTX 3090 24GB	A6000 48GB	Open
MiniCPM-V 8B Q5	Estimated	8GB	18GB	RTX 3090 24GB	A6000 48GB	Open
MiniCPM-V 8B Q8	Estimated	12GB	22GB	RTX 3090 24GB	A6000 48GB	Open
Qwen3 VL 8B CLOUD	Estimated	8GB	16GB	RTX 3090 24GB	A6000 48GB	Open
Qwen3 VL 8B FP16	Estimated	18GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen3 VL 8B Q4	Estimated	6GB	16GB	RTX 3090 24GB	A6000 48GB	Open
Qwen3 VL 8B Q5	Estimated	8GB	18GB	RTX 3090 24GB	A6000 48GB	Open
Qwen3 VL 8B Q8	Estimated	12GB	22GB	RTX 3090 24GB	A6000 48GB	Open
LLaVA 7B FP16	Estimated	18GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
LLaVA 7B Q4	Estimated	8GB	10GB	RTX 3090 24GB	A6000 48GB	Open
LLaVA 7B Q5	Estimated	10GB	12GB	RTX 3090 24GB	A6000 48GB	Open
LLaVA 7B Q8	Estimated	12GB	22GB	RTX 3090 24GB	A6000 48GB	Open
Qwen2.5 VL 7B FP16	Estimated	18GB	30GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen2.5 VL 7B Q4	Estimated	6GB	16GB	RTX 3090 24GB	A6000 48GB	Open
Qwen2.5 VL 7B Q5	Estimated	8GB	18GB	RTX 3090 24GB	A6000 48GB	Open
Qwen2.5 VL 7B Q8	Estimated	12GB	22GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3 4B FP16	Estimated	16GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
Gemma 3 4B Q4	Estimated	4GB	14GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3 4B Q5	Estimated	6GB	16GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3 4B Q8	Estimated	10GB	20GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3n E4B FP16	Estimated	16GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
Gemma 3n E4B Q4	Estimated	4GB	14GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3n E4B Q5	Estimated	6GB	16GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3n E4B Q8	Estimated	10GB	20GB	RTX 3090 24GB	A6000 48GB	Open
Qwen3 VL 4B CLOUD	Estimated	6GB	14GB	RTX 3090 24GB	A6000 48GB	Open
Qwen3 VL 4B FP16	Estimated	16GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen3 VL 4B Q4	Estimated	4GB	14GB	RTX 3090 24GB	A6000 48GB	Open
Qwen3 VL 4B Q5	Estimated	6GB	16GB	RTX 3090 24GB	A6000 48GB	Open
Qwen3 VL 4B Q8	Estimated	10GB	20GB	RTX 3090 24GB	A6000 48GB	Open
Qwen2.5 VL 3B FP16	Estimated	16GB	28GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen2.5 VL 3B Q4	Estimated	4GB	14GB	RTX 3090 24GB	A6000 48GB	Open
Qwen2.5 VL 3B Q5	Estimated	6GB	16GB	RTX 3090 24GB	A6000 48GB	Open
Qwen2.5 VL 3B Q8	Estimated	10GB	20GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3n E2B FP16	Estimated	14GB	26GB	RTX 6000 Ada 48GB	A100 80GB	Open
Gemma 3n E2B Q4	Estimated	2GB	12GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3n E2B Q5	Estimated	4GB	14GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3n E2B Q8	Estimated	8GB	18GB	RTX 3090 24GB	A6000 48GB	Open
Qwen3 VL 2B CLOUD	Estimated	4GB	12GB	RTX 3090 24GB	A6000 48GB	Open
Qwen3 VL 2B FP16	Estimated	14GB	26GB	RTX 6000 Ada 48GB	A100 80GB	Open
Qwen3 VL 2B Q4	Estimated	2GB	12GB	RTX 3090 24GB	A6000 48GB	Open
Qwen3 VL 2B Q5	Estimated	4GB	14GB	RTX 3090 24GB	A6000 48GB	Open
Qwen3 VL 2B Q8	Estimated	8GB	18GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3 1B FP16	Estimated	14GB	26GB	RTX 6000 Ada 48GB	A100 80GB	Open
Gemma 3 1B Q4	Estimated	2GB	12GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3 1B Q5	Estimated	4GB	14GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3 1B Q8	Estimated	8GB	18GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3 270M FP16	Estimated	12GB	24GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3 270M Q4	Estimated	2GB	10GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3 270M Q5	Estimated	2GB	12GB	RTX 3090 24GB	A6000 48GB	Open
Gemma 3 270M Q8	Estimated	6GB	16GB	RTX 3090 24GB	A6000 48GB	Open

Back to all groups Use VRAM calculator Run large models on RunPod Try Vast.ai fallback

We may earn a commission if you click links on this page.