Anthropic's new Claude Code CLI tool has brought powerful "agentic coding" directly to the terminal. It can index your workspace, edit files, and execute commands. However, running these operations entirely on cloud models like Claude 3.5 Sonnet can rack up API costs quickly for intensive users.
The solution? As recently detailed by Eric Tech, you can connect Claude Code to Ollama — allowing you to run powerful AI models locally on your own machine. This hybrid setup gives you unlimited, free agentic workflow capabilities.
What is Ollama?
Ollama is a lightweight framework that allows you to download and run large language models (LLMs) entirely offline on your Mac, Windows, or Linux machine. By acting as a local server, it can seamlessly replace paid API endpoints for tools that support custom model configurations.
Before You Start: Hardware Requirements
Local AI performance is strictly bound by your computer's RAM (Unified Memory on Apple Silicon or VRAM on dedicated GPUs):
- 16GB RAM: The minimum recommended for running 8-billion parameter (8b) models smoothly.
- 32GB+ RAM: Recommended for a fluid developer experience or running larger models (20b+).
Step-by-step Setup Guide
Step 1: Install Ollama
First, grab the installer for your OS from the official Ollama website and run it. Once installed, Ollama will start a background service on your machine.
Step 2: Choose and Pull a Local Model
With Ollama running, open your terminal to pull the model you want to use. The current recommendation for the best balance of speed and coding accuracy is Qwen3-Coder (8b).
ollama run qwen3-coder:8b
For users with 64GB+ RAM, larger models like gpt-oss (20b or 120b) are available for enhanced reasoning.
Step 3: Connect Claude Code
With your local model downloaded and Ollama running in the background, you can now launch Claude Code and point it to your local server instead of Anthropic's cloud.
claude --model ollama/qwen3-coder:8b
That's it! You are now running an agentic coding session powered entirely by your local hardware. You can ask Claude Code to examine files, refactor algorithms, or write tests without spending a dime on API credits.
The Free Cloud Alternative: GLM-4.7
If your computer doesn't have the required 16GB+ of RAM to run local models efficiently, there is a powerful free-tier cloud alternative.
The GLM-4.7 model (by Zhipu AI) offers comparable coding performance to top-tier models and is available locally via Ollama, but also offers a generous free cloud API tier. You can configure Claude Code to use the GLM cloud endpoint using an API key from their developer portal, providing a "local-like" free experience without the heavy hardware requirements.
Final Verdict: The Hybrid Workflow
The ultimate developer setup in 2026 isn't strictly cloud or strictly local—it's hybrid.
Use local models like Qwen3-Coder via Ollama for your endless day-to-day tasks: generating boilerplate, writing tests, and simple refactoring. Then, reserve your paid Claude Pro queries strictly for complex architectural reasoning or intricate debugging.