7B-8B Models

109 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.

Model Data VRAM min VRAM optimal Best local GPU Cloud fallback Detail
DeepSeek-R1 8B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 8B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 8B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 8B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Dolphin 3 8B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Dolphin 3 8B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Dolphin 3 8B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Dolphin 3 8B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Llama 3 8B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Llama 3 8B Q4 Estimated 10GB 12GB RTX 3090 24GB A6000 48GB Open
Llama 3 8B Q5 Estimated 12GB 14GB RTX 3090 24GB A6000 48GB Open
Llama 3 8B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Llama 3.1 8B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Llama 3.1 8B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Llama 3.1 8B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Llama 3.1 8B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
LLaVA Llama3 8B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
LLaVA Llama3 8B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
LLaVA Llama3 8B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
LLaVA Llama3 8B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
MiniCPM-V 8B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
MiniCPM-V 8B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
MiniCPM-V 8B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
MiniCPM-V 8B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen3 8B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 8B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen3 8B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen3 8B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 8B CLOUD Estimated 8GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 8B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 VL 8B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 8B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 8B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
CodeGemma 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
CodeGemma 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
CodeGemma 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
CodeGemma 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
CodeLlama 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
CodeLlama 7B Q4 Estimated 8GB 10GB RTX 3090 24GB A6000 48GB Open
CodeLlama 7B Q5 Estimated 10GB 12GB RTX 3090 24GB A6000 48GB Open
CodeLlama 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Falcon 3 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Falcon 3 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Falcon 3 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Falcon 3 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Gemma 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Gemma 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Gemma 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Glm 4.7 Flash 7B FP16 Measured 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Glm 4.7 Flash 7B Q4 Measured 6GB 16GB RTX 3090 24GB A6000 48GB Open
Glm 4.7 Flash 7B Q5 Measured 8GB 18GB RTX 3090 24GB A6000 48GB Open
Glm 4.7 Flash 7B Q8 Measured 12GB 22GB RTX 3090 24GB A6000 48GB Open
Llama 2 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Llama 2 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Llama 2 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Llama 2 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
LLaVA 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
LLaVA 7B Q4 Estimated 8GB 10GB RTX 3090 24GB A6000 48GB Open
LLaVA 7B Q5 Estimated 10GB 12GB RTX 3090 24GB A6000 48GB Open
LLaVA 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Mistral 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Mistral 7B Q4 Estimated 8GB 10GB RTX 3090 24GB A6000 48GB Open
Mistral 7B Q5 Estimated 10GB 12GB RTX 3090 24GB A6000 48GB Open
Mistral 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
OLMo 2 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
OLMo 2 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
OLMo 2 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
OLMo 2 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
OpenHermes 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
OpenHermes 7B Q4 Estimated 8GB 10GB RTX 3090 24GB A6000 48GB Open
OpenHermes 7B Q5 Estimated 10GB 12GB RTX 3090 24GB A6000 48GB Open
OpenHermes 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen2 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen2 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 Coder 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 VL 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 VL 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 VL 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 VL 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
StarCoder2 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
StarCoder2 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
StarCoder2 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
StarCoder2 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Zephyr 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Zephyr 7B Q4 Estimated 8GB 10GB RTX 3090 24GB A6000 48GB Open
Zephyr 7B Q5 Estimated 10GB 12GB RTX 3090 24GB A6000 48GB Open
Zephyr 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
DeepSeek Coder 6.7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek Coder 6.7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
DeepSeek Coder 6.7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
DeepSeek Coder 6.7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Back to all groups Use VRAM calculator Run large models on RunPod Try Vast.ai fallback

We may earn a commission if you click links on this page.