Supercharging GitHub Copilot CLI with Ollama: Local Models, Full Control

GitHub Copilot CLI is my 'go-to' coding agent when I work directly from your terminal. It understands my codebase, proposes edits, runs commands, and helps me move faster without leaving the command line. As I care about privacy, offline workflows, or custom model experimentation, I decided to try Copilot CLI entirely on local LLMs using Ollama.

No cloud dependency. No API keys. Just my machine, a local model and my workflow.

In this post, I’ll walk through how to set it up, and how to use it effectively.

Why combine Copilot CLI with Ollama?

Copilot CLI gives you a powerful agentic interface for your codebase. Ollama gives you a fast, local model runtime with support for dozens of open models.

Together, you get:

Local-first AI coding: keep your code and prompts on your machine
Predictable performance: no rate limits or network delays
Model flexibility: swap between Qwen, Llama, Mistral, Gemma, and more
Agentic workflows: Copilot CLI can edit, run, and reason using your local model
Offline development: perfect for secure environments or travel

This combination turns your terminal into a fully autonomous coding assistant that respects your boundaries and infrastructure.

Quick Setup: Copilot CLI + Ollama

The integration is surprisingly simple thanks to Ollama’s OpenAI‑compatible API layer.

1. Install Copilot CLI

You can install it via Homebrew, npm, script, or WinGet depending on your platform.

winget install GitHub.Copilot

2. Launch Copilot CLI with Ollama

The fastest way to start:

ollama launch copilot

This spins up Copilot CLI using Ollama’s default model.

To specify a model:

ollama launch copilot --model devstral-small-2

Or any other model you’ve pulled locally.

3. Run Copilot CLI directly

Once launched, you can use Copilot CLI as usual:

Ask questions, request code edits, or let it analyze your repository.

Remark: You'll have to be patient as the Copilot CLI consumes a lot of tokens.

Manual Setup (Environment Variables)

If you want full control — for example in scripts, CI, or Docker — you can wire Copilot CLI to Ollama manually.

Copilot CLI connects to Ollama via the OpenAI‑compatible API:

export COPILOT_PROVIDER_BASE_URL=http://localhost:11434/v1
export COPILOT_PROVIDER_API_KEY=
export COPILOT_PROVIDER_WIRE_API=responses
export COPILOT_MODEL=devstrall-small-2

Then run:

copilot

This is ideal for reproducible environments or when you want to pin a specific model.

Non‑interactive mode (CI/CD, Automation)

Copilot CLI also supports headless execution:

ollama launch copilot --model devstral-small-2 --yes -- -p "Explain how this repository works"

The --yes flag:

auto‑pulls the model
skips interactive prompts
requires --model

Everything after -- is passed directly to Copilot CLI.
Perfect for automated documentation, code reviews, or repo analysis.

Choosing the right local model

Ollama supports a wide range of models, and Copilot CLI benefits from models with:

Large context windows (64k+ recommended)
Strong reasoning
Good code understanding

Popular choices:

Qwen 3.5 / 3.6 — excellent reasoning and long context
Llama 3.1 — balanced performance and speed
Mistral Nemo — lightweight and fast
DeepSeek Coder — optimized for code generation

Cloud‑backed models are also listed in your tab, but for local workflows, the open models above shine.

Is it workable?

Being able to run the Copilot CLI locally makes it a compelling solution. Your code never leaves your machine which would make it ideal for regulated environments.

It would also be nice for travel, secure networks, or environments without internet access.

Sounds good in theory.

But is it a workable solution in practice?

My short answer; no. ... At least not on my machine.

Copilot CLI works best if it has a large context window available. This means that at least 48 GB of VRAM is recommended. If you don't have a GPU that is powerful enough, this is not a workable solution (unfortunately).

Final thoughts

Copilot CLI already feels like a glimpse of the future of agentic development. Pairing it with Ollama brings that future fully onto your machine: private, (fast), customizable, and deeply integrated with your workflow.

If you have a powerful GPU available on your local machine, this setup is a developer‑friendly way to get started locally.

More information

Copilot CLI - Ollama

GitHub Copilot CLI

The art of simplicity

Search This Blog