GitHub Copilot CLI is my 'go-to' coding agent when I work directly from your terminal. It understands my codebase, proposes edits, runs commands, and helps me move faster without leaving the command line. As I care about privacy, offline workflows, or custom model experimentation, I decided to try Copilot CLI entirely on local LLMs using Ollama.
No cloud dependency. No API keys. Just my machine, a local model and my workflow.
In this post, I’ll walk through how to set it up, and how to use it effectively.
Why combine Copilot CLI with Ollama?
Copilot CLI gives you a powerful agentic interface for your codebase. Ollama gives you a fast, local model runtime with support for dozens of open models.
Together, you get:
- Local-first AI coding: keep your code and prompts on your machine
- Predictable performance: no rate limits or network delays
- Model flexibility: swap between Qwen, Llama, Mistral, Gemma, and more
- Agentic workflows: Copilot CLI can edit, run, and reason using your local model
- Offline development: perfect for secure environments or travel
This combination turns your terminal into a fully autonomous coding assistant that respects your boundaries and infrastructure.
Quick Setup: Copilot CLI + Ollama
The integration is surprisingly simple thanks to Ollama’s OpenAI‑compatible API layer.
1. Install Copilot CLI
You can install it via Homebrew, npm, script, or WinGet depending on your platform.
winget install GitHub.Copilot 2. Launch Copilot CLI with Ollama
The fastest way to start:
ollama launch copilot
This spins up Copilot CLI using Ollama’s default model.
To specify a model:
ollama launch copilot --model devstral-small-2
Or any other model you’ve pulled locally.
3. Run Copilot CLI directly
Once launched, you can use Copilot CLI as usual:
Ask questions, request code edits, or let it analyze your repository.
Remark: You'll have to be patient as the Copilot CLI consumes a lot of tokens.
Manual Setup (Environment Variables)
If you want full control — for example in scripts, CI, or Docker — you can wire Copilot CLI to Ollama manually.
Copilot CLI connects to Ollama via the OpenAI‑compatible API:
export COPILOT_PROVIDER_BASE_URL=http://localhost:11434/v1
export COPILOT_PROVIDER_API_KEY=
export COPILOT_PROVIDER_WIRE_API=responses
export COPILOT_MODEL=devstrall-small-2
Then run:
copilot
This is ideal for reproducible environments or when you want to pin a specific model.
Non‑interactive mode (CI/CD, Automation)
Copilot CLI also supports headless execution:
ollama launch copilot --model devstral-small-2 --yes -- -p "Explain how this repository works"
The --yes flag:
- auto‑pulls the model
- skips interactive prompts
- requires
--model
Everything after -- is passed directly to Copilot CLI.
Perfect for automated documentation, code reviews, or repo analysis.
Choosing the right local model
Ollama supports a wide range of models, and Copilot CLI benefits from models with:
- Large context windows (64k+ recommended)
- Strong reasoning
- Good code understanding
Popular choices:
- Qwen 3.5 / 3.6 — excellent reasoning and long context
- Llama 3.1 — balanced performance and speed
- Mistral Nemo — lightweight and fast
- DeepSeek Coder — optimized for code generation
Cloud‑backed models are also listed in your tab, but for local workflows, the open models above shine.
Is it workable?
Being able to run the Copilot CLI locally makes it a compelling solution. Your code never leaves your machine which would make it ideal for regulated environments.
It would also be nice for travel, secure networks, or environments without internet access.
Sounds good in theory.
But is it a workable solution in practice?
My short answer; no. ... At least not on my machine.
Copilot CLI works best if it has a large context window available. This means that at least 48 GB of VRAM is recommended. If you don't have a GPU that is powerful enough, this is not a workable solution (unfortunately).
Final thoughts
Copilot CLI already feels like a glimpse of the future of agentic development. Pairing it with Ollama brings that future fully onto your machine: private, (fast), customizable, and deeply integrated with your workflow.
If you have a powerful GPU available on your local machine, this setup is a developer‑friendly way to get started locally.