Local AI

Run Local

Find open models that fit your hardware. Filter by tool use support, context window, and get the install command for Ollama or LM Studio.

Updated March 22, 2026 · New models are released constantly — contributions welcome

Your Hardware

GPU VRAM

select to filter models

System RAM

used when running without a GPU
Tool calling:
SRuns greatARuns wellBRuns fineCTight fitDVery tightFCan't run

Llama 3.2 1B

Meta · 1B

S

Lightest Llama. Runs on anything.

1GB VRAM·128K ctx·Partial tool calling
ollama run llama3.2:1b

Llama 3.2 3B

Tool use

Meta · 3B

S

Fast and capable for everyday tasks.

CodeAgents
2GB VRAM·128K ctx·Full tool calling
ollama run llama3.2:3b

Qwen 2.5 0.5B

Alibaba · 0.5B

S

Tiny but surprisingly capable.

0.5GB VRAM·128K ctx·No tool calling
ollama run qwen2.5:0.5b

Qwen 2.5 3B

Tool use

Alibaba · 3B

S

Strong multilingual support.

CodeAgents
2GB VRAM·128K ctx·Full tool calling
ollama run qwen2.5:3b

Gemma 3 1B

Google · 1B

S

Google's lightest model. Runs on anything.

1GB VRAM·32K ctx·No tool calling
ollama run gemma3:1b

Phi-3 Mini

Microsoft · 3.8B

A

Punches above its weight. Great for coding.

Code
2.5GB VRAM·128K ctx·Partial tool calling
ollama run phi3:mini

Gemma 3 4B

Tool use

Google · 4B

B

Strong at 4B. Solid tool use and multilingual support.

CodeAgents
3GB VRAM·128K ctx·Full tool calling
ollama run gemma3:4b

Llama 3.1 8B

Tool use

Meta · 8B

D

Best quality/size ratio for most use cases.

CodeAgents
5GB VRAM·128K ctx·Full tool calling
ollama run llama3.1:8b

Qwen 2.5 7B

Tool use

Alibaba · 7B

D

Excellent code and tool use at 7B scale.

CodeAgents
5GB VRAM·128K ctx·Full tool calling
ollama run qwen2.5:7b

Mistral 7B

Mistral · 7B

D

Fast, efficient, great instruction following.

Code
5GB VRAM·32K ctx·Partial tool calling
ollama run mistral:7b

DeepSeek R1 7B

DeepSeek · 7B

D

Reasoning model. Chain-of-thought distilled.

Reasoning
5GB VRAM·128K ctx·Partial tool calling
ollama run deepseek-r1:7b

CodeLlama 7B

Meta · 7B

D

Solid general code generation.

Code
5GB VRAM·16K ctx·No tool calling
ollama run codellama:7b

LLaVA 7B

LLaVA Team · 7B

D

Vision + language. Understands images.

Vision
5GB VRAM·4K ctx·No tool calling
ollama run llava:7b

Llama 3.3 70B

Tool use

Meta · 70B

F

Better than 3.1 70B with the same VRAM. Meta's best open 70B.

CodeAgents
40GB VRAM·128K ctx·Full tool calling
ollama run llama3.3:70b

Llama 3.1 70B

Tool use

Meta · 70B

F

Near-frontier quality locally.

CodeAgents
40GB VRAM·128K ctx·Full tool calling
ollama run llama3.1:70b

Llama 3.1 405B

Tool use

Meta · 405B

F

Largest open model. Needs serious hardware.

CodeAgents
230GB VRAM·128K ctx·Full tool calling
ollama run llama3.1:405b

Qwen 2.5 14B

Tool use

Alibaba · 14B

F

Strong all-rounder, great for coding.

CodeAgents
9GB VRAM·128K ctx·Full tool calling
ollama run qwen2.5:14b

Qwen 2.5 32B

Tool use

Alibaba · 32B

F

Top open-source quality below 70B.

CodeAgents
20GB VRAM·128K ctx·Full tool calling
ollama run qwen2.5:32b

Qwen 2.5 72B

Tool use

Alibaba · 72B

F

Flagship Qwen. Best open multilingual model.

CodeAgents
42GB VRAM·128K ctx·Full tool calling
ollama run qwen2.5:72b

Mixtral 8×7B

Mistral · 47B MoE

F

MoE model with 12.9B active params.

Code
26GB VRAM·32K ctx·Partial tool calling
ollama run mixtral:8x7b

Mistral Small 22B

Tool use

Mistral · 22B

F

Best small Mistral for coding and agents.

CodeAgents
14GB VRAM·32K ctx·Full tool calling
ollama run mistral-small

Mistral Small 3

Tool use

Mistral · 24B

F

Latest Mistral Small. Faster and more accurate than its predecessor.

CodeAgents
15GB VRAM·32K ctx·Full tool calling
ollama run mistral-small3

Phi-3 Medium

Microsoft · 14B

F

Strong reasoning for a 14B model.

Code
9GB VRAM·128K ctx·Partial tool calling
ollama run phi3:medium

Phi-4

Tool use

Microsoft · 14B

F

Latest Phi. Excellent at STEM and tool use.

CodeAgents
9GB VRAM·16K ctx·Full tool calling
ollama run phi4

Gemma 3 12B

Tool use

Google · 12B

F

Best Gemma 3 for everyday tasks. Great instruction following.

CodeAgents
8GB VRAM·128K ctx·Full tool calling
ollama run gemma3:12b

Gemma 3 27B

Tool use

Google · 27B

F

Google's flagship open model. Rivals much larger models.

CodeAgents
17GB VRAM·128K ctx·Full tool calling
ollama run gemma3:27b

DeepSeek V3

Tool use

DeepSeek · 236B MoE

F

DeepSeek's flagship general model. Rivals GPT-4 class. Needs serious hardware.

CodeAgents
80GB VRAM·128K ctx·Full tool calling
ollama run deepseek-v3

DeepSeek R1 14B

DeepSeek · 14B

F

Strong reasoning at 14B.

Reasoning
9GB VRAM·128K ctx·Partial tool calling
ollama run deepseek-r1:14b

DeepSeek R1 32B

DeepSeek · 32B

F

Top open reasoning model.

Reasoning
20GB VRAM·128K ctx·Partial tool calling
ollama run deepseek-r1:32b

DeepSeek Coder V2 16B

DeepSeek · 16B MoE

F

Best open coding model.

Code
10GB VRAM·32K ctx·Partial tool calling
ollama run deepseek-coder-v2:16b

CodeLlama 34B

Meta · 34B

F

Best CodeLlama variant.

Code
21GB VRAM·16K ctx·No tool calling
ollama run codellama:34b

Llama 3.2 Vision 11B

Tool use

Meta · 11B

F

Meta's multimodal model with tool calling.

VisionAgents
7GB VRAM·128K ctx·Full tool calling
ollama run llama3.2-vision:11b