7B-8B Models
109 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.
| Model | Data | VRAM min | VRAM optimal | Best local GPU | Cloud fallback | Detail |
|---|---|---|---|---|---|---|
| DeepSeek-R1 8B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 8B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 8B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 8B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Dolphin 3 8B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Dolphin 3 8B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Dolphin 3 8B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Dolphin 3 8B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Llama 3 8B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Llama 3 8B Q4 | Estimated | 10GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Llama 3 8B Q5 | Estimated | 12GB | 14GB | RTX 3090 24GB | A6000 48GB | Open |
| Llama 3 8B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Llama 3.1 8B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Llama 3.1 8B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Llama 3.1 8B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Llama 3.1 8B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| LLaVA Llama3 8B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| LLaVA Llama3 8B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| LLaVA Llama3 8B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| LLaVA Llama3 8B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| MiniCPM-V 8B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| MiniCPM-V 8B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| MiniCPM-V 8B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| MiniCPM-V 8B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 8B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 8B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 8B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 8B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 8B CLOUD | Estimated | 8GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 8B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen3 VL 8B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 8B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen3 VL 8B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| CodeGemma 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| CodeGemma 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| CodeGemma 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| CodeGemma 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| CodeLlama 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| CodeLlama 7B Q4 | Estimated | 8GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| CodeLlama 7B Q5 | Estimated | 10GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| CodeLlama 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek-R1 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek-R1 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Falcon 3 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Falcon 3 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Gemma 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Gemma 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Glm 4.7 Flash 7B FP16 | Measured | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Glm 4.7 Flash 7B Q4 | Measured | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Glm 4.7 Flash 7B Q5 | Measured | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Glm 4.7 Flash 7B Q8 | Measured | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Llama 2 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Llama 2 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Llama 2 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Llama 2 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| LLaVA 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| LLaVA 7B Q4 | Estimated | 8GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| LLaVA 7B Q5 | Estimated | 10GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| LLaVA 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Mistral 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Mistral 7B Q4 | Estimated | 8GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Mistral 7B Q5 | Estimated | 10GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Mistral 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| OLMo 2 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| OLMo 2 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| OLMo 2 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| OLMo 2 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| OpenHermes 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| OpenHermes 7B Q4 | Estimated | 8GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| OpenHermes 7B Q5 | Estimated | 10GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| OpenHermes 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 Coder 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 Coder 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 VL 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Qwen2.5 VL 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 VL 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| Qwen2.5 VL 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| StarCoder2 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| StarCoder2 7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| StarCoder2 7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| StarCoder2 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| Zephyr 7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| Zephyr 7B Q4 | Estimated | 8GB | 10GB | RTX 3090 24GB | A6000 48GB | Open |
| Zephyr 7B Q5 | Estimated | 10GB | 12GB | RTX 3090 24GB | A6000 48GB | Open |
| Zephyr 7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek Coder 6.7B FP16 | Estimated | 18GB | 30GB | RTX 6000 Ada 48GB | A100 80GB | Open |
| DeepSeek Coder 6.7B Q4 | Estimated | 6GB | 16GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek Coder 6.7B Q5 | Estimated | 8GB | 18GB | RTX 3090 24GB | A6000 48GB | Open |
| DeepSeek Coder 6.7B Q8 | Estimated | 12GB | 22GB | RTX 3090 24GB | A6000 48GB | Open |
We may earn a commission if you click links on this page.