Documentation — CanIRunAi

🚀

Getting Started

Learn the basics of running AI models locally.

Install Ollama: curl -fsSL https://ollama.com/install.sh | sh
Pull a model: ollama pull llama3.1:8b
Run it: ollama run llama3.1:8b

🖥️

Hardware Guide

Understanding GPU, CPU, and RAM requirements.

VRAM is the most critical factor — models must fit in GPU memory
8 GB VRAM: can run 7-8B parameter models at Q4 quantization
24 GB VRAM: comfortable for 13-32B models
System RAM acts as fallback but is much slower

📦

Quantization

How quantization affects model quality and performance.

Q4_K_M: best quality/size ratio — recommended for most users
Q5_K_M: slightly better quality, ~20% more VRAM
Q8_0: near-original quality, needs 2x the VRAM
GGUF format is used by llama.cpp and Ollama

⚙️

Inference Engines

Tools and frameworks for running models locally.

Ollama: easiest setup, great for beginners
llama.cpp: most flexible, C++ implementation
vLLM: high-throughput serving for production
LM Studio: GUI-based, cross-platform

⚡

Optimization Tips

Get the best performance from your hardware.

Use quantized models to reduce VRAM usage
Enable flash attention when available
Set context length to what you actually need
Use GPU offloading for partial acceleration