GPUdetecting...

VRAM—

BW—

RAM—

CORES—

Estimates based on browser APIs. Actual specs may vary.

Runs great

Runs well

Decent

Tight fit

Barely runs

Too heavy

Llama 3.1 8B

Llama 3.1 Community

Meta · 8B · Llama 3.1 Community

Meta's versatile 8B — great quality/speed ratio

4.6 GB·128Kctx·1y ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2024-07Architecture DenseMemory —

chatcodereasoning

DeepSeek R1

MIT

DeepSeek · 671B · active 37B · MIT

Massive MoE reasoning model — 37B active

344.2 GB·64Kctx·1y ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2025-01Architecture MoEactive 37BMemory —

reasoning

DeepSeek V3.2

MIT

DeepSeek · 685B · active 37B · MIT

State-of-the-art MoE — 37B active params

351.4 GB·128Kctx·6mo ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2025-12Architecture MoEactive 37BMemory —

chatcodereasoning

GPT-OSS 120B

Apache 2.0

OpenAI · 117B · active 5.1B · Apache 2.0

OpenAI's flagship open-weight MoE — 52.6% SWE-bench

60.4 GB·128Kctx·10mo ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2025-08Architecture MoEactive 5.1BMemory —

chatreasoningcode

DeepSeek R1 Distill 32B

MIT

DeepSeek · 32B · MIT

R1 reasoning distilled into Qwen 32B — sweet spot

16.9 GB·64Kctx·1y ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2025-01Architecture DenseMemory —

reasoning

GPT-OSS 20B

Apache 2.0

OpenAI · 21B · active 3.6B · Apache 2.0

OpenAI's open-weight MoE with configurable reasoning

11.3 GB·128Kctx·10mo ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2025-08Architecture MoEactive 3.6BMemory —

chatreasoningcode

Llama 3.3 70B

Llama 3.3 Community

Meta · 70B · Llama 3.3 Community

Best open model at 70B class

36.4 GB·128Kctx·1y ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2024-12Architecture DenseMemory —

chatreasoningcode

Gemma 3 27B

Gemma

Google · 27B · Gemma

Google's flagship Gemma 3 model

14.3 GB·128Kctx·1y ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2025-03Architecture DenseMemory —

chatvisionreasoning

Qwen 2.5 Coder 32B

Apache 2.0

Alibaba · 32B · Apache 2.0

Best open-source coding model at release

16.9 GB·128Kctx·1y ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2024-11Architecture DenseMemory —

code

Qwen 3 32B

Apache 2.0

Alibaba · 32B · Apache 2.0

Qwen 3 flagship dense model

16.9 GB·128Kctx·1y ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2025-04Architecture DenseMemory —

chatcodereasoning

Mistral Small 3.1 24B

Apache 2.0

Mistral AI · 24B · Apache 2.0

Multimodal Mistral with vision support

12.8 GB·128Kctx·1y ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2025-03Architecture DenseMemory —

chatvisioncode

Llama 4 Scout 17B

Llama 4 Community

Meta · 109B · active 17B · Llama 4 Community

MoE with 16 experts, 17B active params

56.3 GB·128Kctx·1y ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2025-04Architecture MoEactive 17BMemory —

chatvisionreasoning

Gemma 3 4B

Gemma

Google · 4B · Gemma

Google's compact multimodal model

3.0 GB·128Kctx·1y ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2025-03Architecture DenseMemory —

chatvision

Llama 3.2 1B

Llama 3.2 Community

Meta · 1B · Llama 3.2 Community

Meta's smallest Llama for edge devices

1.0 GB·128Kctx·1y ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2024-09Architecture DenseMemory —

chatedge

Phi-4 14B

MIT

Microsoft · 14B · MIT

Microsoft's reasoning-focused model

7.7 GB·16Kctx·1y ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2024-12Architecture DenseMemory —

reasoningcode

Devstral 2 123B

MRL

Mistral AI · 123B · MRL

Dense 123B coding model — 72.2% SWE-bench Verified

63.5 GB·256Kctx·6mo ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2025-12Architecture DenseMemory —

code

Qwen 3.5 9B

Apache 2.0

Alibaba · 9B · Apache 2.0

Multimodal Qwen 3.5 mid-size

5.1 GB·32Kctx·4mo ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2026-02Architecture DenseMemory —

chatvision

Kimi K2

Kimi

Moonshot AI · 1T · active 32B · Kimi

1T-param MoE with 384 experts — 32B active, strong agentic coding

512.7 GB·128Kctx·11mo ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2025-07Architecture MoEactive 32BMemory —

chatreasoningcode

Phi-4 Mini 3.8B

MIT

Microsoft · 3.8B · MIT

Microsoft's compact reasoning model

2.8 GB·16Kctx·1y ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2025-02Architecture DenseMemory —

chatcodereasoning

Qwen 3 0.6B

Apache 2.0

Alibaba · 600M · Apache 2.0

Ultra-light Qwen 3 model for constrained devices

0.8 GB·32Kctx·1y ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2025-04Architecture DenseMemory —

chatedge

Qwen 3 1.7B

Apache 2.0

Alibaba · 1.7B · Apache 2.0

Compact Qwen 3 for mobile and edge

1.5 GB·32Kctx·1y ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2025-04Architecture DenseMemory —

chatedge

Qwen 3.5 0.8B

Apache 2.0

Alibaba · 800M · Apache 2.0

Ultra-tiny model for embedded and edge

0.9 GB·32Kctx·4mo ago

Q2_KQ3_K_MQ4_K_MQ5_K_MQ6_KQ8_0F16

Released 2026-02Architecture DenseMemory —

chatedge

How do I know which AI model I can run locally?

CanIRunAi detects your GPU, VRAM, RAM, and CPU cores directly in your browser, then matches your hardware against 20+ popular AI models. Models that fit in your VRAM get grades S/A/B, while those requiring more memory get lower grades.

What is the minimum GPU needed to run AI models locally?

You need at least 8GB VRAM to run 7-8B parameter models (like Llama 3.1 8B) at Q4 quantization. For larger models like 70B, you need 24GB+ VRAM. CPU-only inference is possible but significantly slower.

What is quantization and why does it matter?

Quantization reduces model precision (from 16-bit to 4-bit or lower), cutting VRAM usage by 2-4x with minimal quality loss. Q4_K_M is the most popular quantization level, offering the best balance of size and quality.

Can I run AI models without a GPU?

Yes, using CPU inference with llama.cpp or Ollama. However, expect 5-10x slower speeds. For best results, use smaller models (1-4B parameters) and ensure you have sufficient system RAM (at least 2x the model size).

Which is the best open-source AI model for coding?

As of 2025, DeepSeek R1 Distill 32B and Qwen 2.5 Coder 32B are top choices for coding tasks. For lighter hardware, Phi-4 Mini 3.8B offers good coding ability with minimal resource requirements.

Can I Run AI locally?

Llama 3.1 8B

DeepSeek R1

DeepSeek V3.2

GPT-OSS 120B

DeepSeek R1 Distill 32B

GPT-OSS 20B

Llama 3.3 70B

Gemma 3 27B

Qwen 2.5 Coder 32B

Qwen 3 32B

Mistral Small 3.1 24B

Llama 4 Scout 17B

Gemma 3 4B

Llama 3.2 1B

Phi-4 14B

Devstral 2 123B

Qwen 3.5 9B

Kimi K2

Phi-4 Mini 3.8B

Qwen 3 0.6B

Qwen 3 1.7B

Qwen 3.5 0.8B

How do I know which AI model I can run locally?

What is the minimum GPU needed to run AI models locally?

What is quantization and why does it matter?

Can I run AI models without a GPU?

Which is the best open-source AI model for coding?