Open-Source Models

471 profiles in this group. Use this hub page to compare practical VRAM floor, expected throughput, and best local-vs-cloud path.

Model Data VRAM min VRAM optimal Best local GPU Cloud fallback Detail
DeepSeek-R1 671B FP16 Estimated 430GB 442GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-R1 671B Q4 Estimated 418GB 428GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-R1 671B Q5 Estimated 420GB 430GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-R1 671B Q8 Estimated 424GB 434GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-V3 671B FP16 Estimated 430GB 442GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-V3 671B Q4 Estimated 418GB 428GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-V3 671B Q5 Estimated 420GB 430GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek-V3 671B Q8 Estimated 424GB 434GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 Coder 480B CLOUD Estimated 215GB 223GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 Coder 480B FP16 Estimated 225GB 237GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 Coder 480B Q4 Estimated 213GB 223GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 Coder 480B Q5 Estimated 215GB 225GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 Coder 480B Q8 Estimated 219GB 229GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3.5 397B-A17B CLOUD Estimated 215GB 223GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek Coder V2 236B FP16 Estimated 150GB 162GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek Coder V2 236B Q4 Estimated 138GB 148GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek Coder V2 236B Q5 Estimated 140GB 150GB Cloud-first (no practical single-GPU local) H100/H200 class Open
DeepSeek Coder V2 236B Q8 Estimated 144GB 154GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 235B FP16 Estimated 150GB 162GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 235B Q4 Estimated 138GB 148GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 235B Q5 Estimated 140GB 150GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 235B Q8 Estimated 144GB 154GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 VL 235B CLOUD Estimated 140GB 148GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 VL 235B FP16 Estimated 150GB 162GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 VL 235B Q4 Estimated 138GB 148GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 VL 235B Q5 Estimated 140GB 150GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3 VL 235B Q8 Estimated 144GB 154GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Mixtral 8X22B FP16 Estimated 150GB 162GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Mixtral 8X22B Q4 Estimated 138GB 148GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Mixtral 8X22B Q5 Estimated 140GB 150GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Mixtral 8X22B Q8 Estimated 144GB 154GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3.5 122B FP16 Estimated 150GB 162GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3.5 122B Q4 Estimated 138GB 148GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3.5 122B Q5 Estimated 140GB 150GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen3.5 122B Q8 Estimated 144GB 154GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen 110B FP16 Estimated 80GB 92GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen 110B Q4 Estimated 68GB 78GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen 110B Q5 Estimated 70GB 80GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen 110B Q8 Estimated 74GB 84GB Cloud-first (no practical single-GPU local) H100/H200 class Open
Qwen 72B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen 72B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
Qwen 72B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen 72B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2 72B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2 72B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2 72B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2 72B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2.5 72B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2.5 72B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 72B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2.5 72B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2.5 VL 72B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2.5 VL 72B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 VL 72B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen2.5 VL 72B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-R1 70B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-R1 70B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 70B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-R1 70B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-R1 67B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-R1 67B Q4 Estimated 24GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 67B Q5 Estimated 30GB 32GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 67B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-V3 67B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
DeepSeek-V3 67B Q4 Estimated 24GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-V3 67B Q5 Estimated 30GB 32GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-V3 67B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
Mixtral 8X7B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Mixtral 8X7B Q4 Estimated 20GB 24GB RTX 3090 24GB A6000 48GB Open
Mixtral 8X7B Q5 Estimated 24GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Mixtral 8X7B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen3.5 35B FP16 Estimated 50GB 62GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen3.5 35B Q4 Estimated 38GB 48GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3.5 35B Q5 Estimated 40GB 50GB Dual RTX 4090 (model parallel) A100 80GB Open
Qwen3.5 35B Q8 Estimated 44GB 54GB Dual RTX 4090 (model parallel) A100 80GB Open
LLaVA 34B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
LLaVA 34B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
LLaVA 34B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
LLaVA 34B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek Coder 33B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek Coder 33B Q4 Estimated 16GB 20GB RTX 3090 24GB A6000 48GB Open
DeepSeek Coder 33B Q5 Estimated 20GB 22GB RTX 3090 24GB A6000 48GB Open
DeepSeek Coder 33B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 32B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 32B Q4 Measured 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 32B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 32B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen 32B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
Qwen 32B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen 32B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen 32B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 32B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 32B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 32B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 32B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 Coder 32B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 Coder 32B Q4 Estimated 16GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 32B Q5 Estimated 20GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 32B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 VL 32B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 VL 32B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 VL 32B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 VL 32B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 32B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 32B Q4 Measured 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 32B Q5 Measured 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 32B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 VL 32B CLOUD Estimated 20GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 VL 32B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 VL 32B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 VL 32B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 VL 32B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
QwQ 32B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
QwQ 32B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
QwQ 32B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
QwQ 32B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Nemotron 3 Nano 30B FP16 Measured 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
Nemotron 3 Nano 30B Q4 Measured 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Nemotron 3 Nano 30B Q5 Measured 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Nemotron 3 Nano 30B Q8 Measured 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 30B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 30B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 30B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 30B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 Coder 30B CLOUD Estimated 20GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 Coder 30B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 Coder 30B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 Coder 30B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 Coder 30B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 VL 30B CLOUD Estimated 20GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 VL 30B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 VL 30B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 VL 30B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 VL 30B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 2 27B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 2 27B Q4 Estimated 16GB 20GB RTX 3090 24GB A6000 48GB Open
Gemma 2 27B Q5 Estimated 20GB 22GB RTX 3090 24GB A6000 48GB Open
Gemma 2 27B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 3 27B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 3 27B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 3 27B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 3 27B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Translategemma 27B FP16 Measured 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
Translategemma 27B Q4 Measured 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Translategemma 27B Q5 Measured 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Translategemma 27B Q8 Measured 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Magistral 24B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
Magistral 24B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Magistral 24B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Magistral 24B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek Coder V2 16B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek Coder V2 16B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek Coder V2 16B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek Coder V2 16B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
StarCoder2 15B FP16 Estimated 30GB 42GB RTX 6000 Ada 48GB A100 80GB Open
StarCoder2 15B Q4 Estimated 18GB 28GB RTX 6000 Ada 48GB A100 80GB Open
StarCoder2 15B Q5 Estimated 20GB 30GB RTX 6000 Ada 48GB A100 80GB Open
StarCoder2 15B Q8 Estimated 24GB 34GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Ministral 3 14B FP16 Measured 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Ministral 3 14B Q4 Measured 10GB 20GB RTX 3090 24GB A6000 48GB Open
Ministral 3 14B Q5 Measured 12GB 22GB RTX 3090 24GB A6000 48GB Open
Ministral 3 14B Q8 Measured 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Phi-3 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Phi-3 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Phi-3 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Phi-3 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Phi-4 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Phi-4 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Phi-4 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Phi-4 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Phi-4 Reasoning 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Phi-4 Reasoning 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Phi-4 Reasoning 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Phi-4 Reasoning 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 Coder 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 Coder 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 14B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 14B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen3 14B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen3 14B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
LLaVA 13B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
LLaVA 13B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
LLaVA 13B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
LLaVA 13B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
OLMo 2 13B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
OLMo 2 13B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
OLMo 2 13B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
OLMo 2 13B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 3 12B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 3 12B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Gemma 3 12B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Gemma 3 12B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Mistral Nemo 12B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Mistral Nemo 12B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Mistral Nemo 12B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Mistral Nemo 12B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Falcon 3 10B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Falcon 3 10B Q4 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Falcon 3 10B Q5 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Falcon 3 10B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 2 9B FP16 Estimated 22GB 34GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 2 9B Q4 Estimated 10GB 12GB RTX 3090 24GB A6000 48GB Open
Gemma 2 9B Q5 Estimated 12GB 14GB RTX 3090 24GB A6000 48GB Open
Gemma 2 9B Q8 Estimated 16GB 26GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 8B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 8B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 8B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 8B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Dolphin 3 8B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Dolphin 3 8B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Dolphin 3 8B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Dolphin 3 8B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
LLaVA Llama3 8B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
LLaVA Llama3 8B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
LLaVA Llama3 8B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
LLaVA Llama3 8B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
MiniCPM-V 8B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
MiniCPM-V 8B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
MiniCPM-V 8B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
MiniCPM-V 8B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen3 8B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 8B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen3 8B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen3 8B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 8B CLOUD Estimated 8GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 8B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 VL 8B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 8B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 8B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
CodeGemma 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
CodeGemma 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
CodeGemma 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
CodeGemma 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Falcon 3 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Falcon 3 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Falcon 3 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Falcon 3 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Gemma 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Gemma 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Gemma 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Glm 4.7 Flash 7B FP16 Measured 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Glm 4.7 Flash 7B Q4 Measured 6GB 16GB RTX 3090 24GB A6000 48GB Open
Glm 4.7 Flash 7B Q5 Measured 8GB 18GB RTX 3090 24GB A6000 48GB Open
Glm 4.7 Flash 7B Q8 Measured 12GB 22GB RTX 3090 24GB A6000 48GB Open
LLaVA 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
LLaVA 7B Q4 Estimated 8GB 10GB RTX 3090 24GB A6000 48GB Open
LLaVA 7B Q5 Estimated 10GB 12GB RTX 3090 24GB A6000 48GB Open
LLaVA 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Mistral 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Mistral 7B Q4 Estimated 8GB 10GB RTX 3090 24GB A6000 48GB Open
Mistral 7B Q5 Estimated 10GB 12GB RTX 3090 24GB A6000 48GB Open
Mistral 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
OLMo 2 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
OLMo 2 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
OLMo 2 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
OLMo 2 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
OpenHermes 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
OpenHermes 7B Q4 Estimated 8GB 10GB RTX 3090 24GB A6000 48GB Open
OpenHermes 7B Q5 Estimated 10GB 12GB RTX 3090 24GB A6000 48GB Open
OpenHermes 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen2 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen2 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 Coder 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 VL 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 VL 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 VL 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 VL 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
StarCoder2 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
StarCoder2 7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
StarCoder2 7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
StarCoder2 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Zephyr 7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
Zephyr 7B Q4 Estimated 8GB 10GB RTX 3090 24GB A6000 48GB Open
Zephyr 7B Q5 Estimated 10GB 12GB RTX 3090 24GB A6000 48GB Open
Zephyr 7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
DeepSeek Coder 6.7B FP16 Estimated 18GB 30GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek Coder 6.7B Q4 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
DeepSeek Coder 6.7B Q5 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
DeepSeek Coder 6.7B Q8 Estimated 12GB 22GB RTX 3090 24GB A6000 48GB Open
Gemma 3 4B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 3 4B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Gemma 3 4B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Gemma 3 4B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Gemma 3n E4B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 3n E4B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Gemma 3n E4B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Gemma 3n E4B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen 4B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen 4B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen 4B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen 4B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen3 4B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 4B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen3 4B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen3 4B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 4B CLOUD Estimated 6GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 4B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 VL 4B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 4B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 4B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Phi-3 3.8B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Phi-3 3.8B Q4 Estimated 4GB 6GB RTX 3090 24GB A6000 48GB Open
Phi-3 3.8B Q5 Estimated 5GB 8GB RTX 3090 24GB A6000 48GB Open
Phi-3 3.8B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Falcon 3 3B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Falcon 3 3B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Falcon 3 3B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Falcon 3 3B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 3B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Granite 3.1 MoE 3B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 3B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 3B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 3B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 3B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 3B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 3B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 3B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 Coder 3B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 3B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 3B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 VL 3B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 VL 3B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 VL 3B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 VL 3B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
StarCoder2 3B FP16 Estimated 16GB 28GB RTX 6000 Ada 48GB A100 80GB Open
StarCoder2 3B Q4 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
StarCoder2 3B Q5 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
StarCoder2 3B Q8 Estimated 10GB 20GB RTX 3090 24GB A6000 48GB Open
CodeGemma 2B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
CodeGemma 2B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
CodeGemma 2B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
CodeGemma 2B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Gemma 2 2B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 2 2B Q4 Estimated 2GB 4GB RTX 3090 24GB A6000 48GB Open
Gemma 2 2B Q5 Estimated 3GB 6GB RTX 3090 24GB A6000 48GB Open
Gemma 2 2B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Gemma 2B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 2B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Gemma 2B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Gemma 2B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Gemma 3n E2B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 3n E2B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Gemma 3n E2B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Gemma 3n E2B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 2B CLOUD Estimated 4GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 2B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 VL 2B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 2B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen3 VL 2B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen 1.8B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen 1.8B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen 1.8B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen 1.8B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen3 1.7B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 1.7B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen3 1.7B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen3 1.7B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
SmolLM2 1.7B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
SmolLM2 1.7B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
SmolLM2 1.7B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
SmolLM2 1.7B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 1.5B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek-R1 1.5B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 1.5B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
DeepSeek-R1 1.5B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen2 1.5B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2 1.5B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen2 1.5B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen2 1.5B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 1.5B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 1.5B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 1.5B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 1.5B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 1.5B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen2.5 Coder 1.5B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 1.5B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 1.5B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
DeepSeek Coder 1.3B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
DeepSeek Coder 1.3B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
DeepSeek Coder 1.3B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
DeepSeek Coder 1.3B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
TinyLlama 1.1B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
TinyLlama 1.1B Q4 Estimated 2GB 4GB RTX 3090 24GB A6000 48GB Open
TinyLlama 1.1B Q5 Estimated 3GB 6GB RTX 3090 24GB A6000 48GB Open
TinyLlama 1.1B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Falcon 3 1B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Falcon 3 1B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Falcon 3 1B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Falcon 3 1B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Gemma 3 1B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Gemma 3 1B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Gemma 3 1B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Gemma 3 1B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 1B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Granite 3.1 MoE 1B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 1B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Granite 3.1 MoE 1B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
Qwen3 0.6B FP16 Estimated 14GB 26GB RTX 6000 Ada 48GB A100 80GB Open
Qwen3 0.6B Q4 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen3 0.6B Q5 Estimated 4GB 14GB RTX 3090 24GB A6000 48GB Open
Qwen3 0.6B Q8 Estimated 8GB 18GB RTX 3090 24GB A6000 48GB Open
BGE-M3 567M FP16 Estimated 4GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen 0.5B FP16 Estimated 12GB 24GB RTX 3090 24GB A6000 48GB Open
Qwen 0.5B Q4 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Qwen 0.5B Q5 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen 0.5B Q8 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2 0.5B FP16 Estimated 12GB 24GB RTX 3090 24GB A6000 48GB Open
Qwen2 0.5B Q4 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Qwen2 0.5B Q5 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen2 0.5B Q8 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 0.5B FP16 Estimated 12GB 24GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 0.5B Q4 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 0.5B Q5 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 0.5B Q8 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 0.5B FP16 Estimated 12GB 24GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 0.5B Q4 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 0.5B Q5 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Qwen2.5 Coder 0.5B Q8 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
SmolLM2 360M FP16 Estimated 12GB 24GB RTX 3090 24GB A6000 48GB Open
SmolLM2 360M Q4 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
SmolLM2 360M Q5 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
SmolLM2 360M Q8 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
MXBAI Embed Large 335M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Snowflake Arctic Embed 335M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Gemma 3 270M FP16 Estimated 12GB 24GB RTX 3090 24GB A6000 48GB Open
Gemma 3 270M Q4 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Gemma 3 270M Q5 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
Gemma 3 270M Q8 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Nomic Embed Text 137M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Snowflake Arctic Embed 137M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
SmolLM2 135M FP16 Estimated 12GB 24GB RTX 3090 24GB A6000 48GB Open
SmolLM2 135M Q4 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
SmolLM2 135M Q5 Estimated 2GB 12GB RTX 3090 24GB A6000 48GB Open
SmolLM2 135M Q8 Estimated 6GB 16GB RTX 3090 24GB A6000 48GB Open
Snowflake Arctic Embed 110M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
All-MiniLM 33M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Snowflake Arctic Embed 33M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
All-MiniLM 22M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Snowflake Arctic Embed 22M FP16 Estimated 2GB 10GB RTX 3090 24GB A6000 48GB Open
Back to all groups Use VRAM calculator Run large models on RunPod Try Vast.ai fallback

We may earn a commission if you click links on this page.