Can I Run AI locally?
Find out which AI models your machine can actually run.
Estimates based on browser APIs. Actual specs may vary.
Llama 3.1 8B
Llama 3.1 CommunityMeta's versatile 8B — great quality/speed ratio
DeepSeek R1
MITMassive MoE reasoning model — 37B active
DeepSeek V3.2
MITState-of-the-art MoE — 37B active params
GPT-OSS 120B
Apache 2.0OpenAI's flagship open-weight MoE — 52.6% SWE-bench
DeepSeek R1 Distill 32B
MITR1 reasoning distilled into Qwen 32B — sweet spot
GPT-OSS 20B
Apache 2.0OpenAI's open-weight MoE with configurable reasoning
Llama 3.3 70B
Llama 3.3 CommunityBest open model at 70B class
Gemma 3 27B
GemmaGoogle's flagship Gemma 3 model
Qwen 2.5 Coder 32B
Apache 2.0Best open-source coding model at release
Qwen 3 32B
Apache 2.0Qwen 3 flagship dense model
Mistral Small 3.1 24B
Apache 2.0Multimodal Mistral with vision support
Llama 4 Scout 17B
Llama 4 CommunityMoE with 16 experts, 17B active params
Gemma 3 4B
GemmaGoogle's compact multimodal model
Llama 3.2 1B
Llama 3.2 CommunityMeta's smallest Llama for edge devices
Phi-4 14B
MITMicrosoft's reasoning-focused model
Devstral 2 123B
MRLDense 123B coding model — 72.2% SWE-bench Verified
Qwen 3.5 9B
Apache 2.0Multimodal Qwen 3.5 mid-size
Kimi K2
Kimi1T-param MoE with 384 experts — 32B active, strong agentic coding
Phi-4 Mini 3.8B
MITMicrosoft's compact reasoning model
Qwen 3 0.6B
Apache 2.0Ultra-light Qwen 3 model for constrained devices
Qwen 3 1.7B
Apache 2.0Compact Qwen 3 for mobile and edge
Qwen 3.5 0.8B
Apache 2.0Ultra-tiny model for embedded and edge
How do I know which AI model I can run locally?
CanIRunAi detects your GPU, VRAM, RAM, and CPU cores directly in your browser, then matches your hardware against 20+ popular AI models. Models that fit in your VRAM get grades S/A/B, while those requiring more memory get lower grades.
What is the minimum GPU needed to run AI models locally?
You need at least 8GB VRAM to run 7-8B parameter models (like Llama 3.1 8B) at Q4 quantization. For larger models like 70B, you need 24GB+ VRAM. CPU-only inference is possible but significantly slower.
What is quantization and why does it matter?
Quantization reduces model precision (from 16-bit to 4-bit or lower), cutting VRAM usage by 2-4x with minimal quality loss. Q4_K_M is the most popular quantization level, offering the best balance of size and quality.
Can I run AI models without a GPU?
Yes, using CPU inference with llama.cpp or Ollama. However, expect 5-10x slower speeds. For best results, use smaller models (1-4B parameters) and ensure you have sufficient system RAM (at least 2x the model size).
Which is the best open-source AI model for coding?
As of 2025, DeepSeek R1 Distill 32B and Qwen 2.5 Coder 32B are top choices for coding tasks. For lighter hardware, Phi-4 Mini 3.8B offers good coding ability with minimal resource requirements.