Skip to main content

Supercharging GitHub Copilot CLI with Ollama: Local Models, Full Control

GitHub Copilot CLI is my 'go-to' coding agent when I work directly from your terminal. It understands my codebase, proposes edits, runs commands, and helps me move faster without leaving the command line. As I care about privacy, offline workflows, or custom model experimentation, I decided to try Copilot CLI entirely on local LLMs using Ollama.

No cloud dependency. No API keys. Just my machine, a local model and my workflow.

In this post, I’ll walk through how to set it up, and how to use it effectively.

Why combine Copilot CLI with Ollama?

Copilot CLI gives you a powerful agentic interface for your codebase. Ollama gives you a fast, local model runtime with support for dozens of open models.

Together, you get:

  • Local-first AI coding: keep your code and prompts on your machine
  • Predictable performance: no rate limits or network delays
  • Model flexibility: swap between Qwen, Llama, Mistral, Gemma, and more
  • Agentic workflows: Copilot CLI can edit, run, and reason using your local model
  • Offline development: perfect for secure environments or travel

This combination turns your terminal into a fully autonomous coding assistant that respects your boundaries and infrastructure.

Quick Setup: Copilot CLI + Ollama

The integration is surprisingly simple thanks to Ollama’s OpenAI‑compatible API layer.

1. Install Copilot CLI

You can install it via Homebrew, npm, script, or WinGet depending on your platform.

winget install GitHub.Copilot

2. Launch Copilot CLI with Ollama

The fastest way to start:

ollama launch copilot

This spins up Copilot CLI using Ollama’s default model.

To specify a model:

ollama launch copilot --model devstral-small-2

Or any other model you’ve pulled locally.

3. Run Copilot CLI directly

Once launched, you can use Copilot CLI as usual:


Ask questions, request code edits, or let it analyze your repository.

Remark: You'll have to be patient as the Copilot CLI consumes a lot of tokens.

Manual Setup (Environment Variables)

If you want full control — for example in scripts, CI, or Docker — you can wire Copilot CLI to Ollama manually.

Copilot CLI connects to Ollama via the OpenAI‑compatible API:

export COPILOT_PROVIDER_BASE_URL=http://localhost:11434/v1
export COPILOT_PROVIDER_API_KEY=
export COPILOT_PROVIDER_WIRE_API=responses
export COPILOT_MODEL=devstrall-small-2

Then run:

copilot

This is ideal for reproducible environments or when you want to pin a specific model.

Non‑interactive mode (CI/CD, Automation)

Copilot CLI also supports headless execution:

ollama launch copilot --model devstral-small-2 --yes -- -p "Explain how this repository works"

The --yes flag:

  • auto‑pulls the model
  • skips interactive prompts
  • requires --model

Everything after -- is passed directly to Copilot CLI.
Perfect for automated documentation, code reviews, or repo analysis. 

Choosing the right local model

Ollama supports a wide range of models, and Copilot CLI benefits from models with:

  • Large context windows (64k+ recommended)
  • Strong reasoning
  • Good code understanding

Popular choices:

  • Qwen 3.5 / 3.6 — excellent reasoning and long context
  • Llama 3.1 — balanced performance and speed
  • Mistral Nemo — lightweight and fast
  • DeepSeek Coder — optimized for code generation

Cloud‑backed models are also listed in your tab, but for local workflows, the open models above shine. 

Is it workable?

Being able to run the Copilot CLI locally makes it a compelling solution. Your code never leaves your machine which would make it ideal for regulated environments.

It would also be nice for travel, secure networks, or environments without internet access.

Sounds good in theory. 

But is it a workable solution in practice? 

My short answer; no. ... At least not on my machine. 

Copilot CLI works best if it has a large context window available. This means that at least 48 GB of VRAM is recommended. If you don't have a GPU that is powerful enough, this is not a workable solution (unfortunately).

Final thoughts

Copilot CLI already feels like a glimpse of the future of agentic development. Pairing it with Ollama brings that future fully onto your machine: private, (fast), customizable, and deeply integrated with your workflow.

If you have a powerful GPU available on your local machine, this setup is a developer‑friendly way to get started locally.

More information

Copilot CLI - Ollama

GitHub Copilot CLI

Popular posts from this blog

Podman– Command execution failed with exit code 125

After updating WSL on one of the developer machines, Podman failed to work. When we took a look through Podman Desktop, we noticed that Podman had stopped running and returned the following error message: Error: Command execution failed with exit code 125 Here are the steps we tried to fix the issue: We started by running podman info to get some extra details on what could be wrong: >podman info OS: windows/amd64 provider: wsl version: 5.3.1 Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM Error: unable to connect to Podman socket: failed to connect: dial tcp 127.0.0.1:2655: connectex: No connection could be made because the target machine actively refused it. That makes sense as the podman VM was not running. Let’s check the VM: >podman machine list NAME         ...

Azure DevOps/ GitHub emoji

I’m really bad at remembering emoji’s. So here is cheat sheet with all emoji’s that can be used in tools that support the github emoji markdown markup: All credits go to rcaviers who created this list.

VS Code Planning mode

After the introduction of Plan mode in Visual Studio , it now also found its way into VS Code. Planning mode, or as I like to call it 'Hannibal mode', extends GitHub Copilot's Agent Mode capabilities to handle larger, multi-step coding tasks with a structured approach. Instead of jumping straight into code generation, Planning mode creates a detailed execution plan. If you want more details, have a look at my previous post . Putting plan mode into action VS Code takes a different approach compared to Visual Studio when using plan mode. Instead of a configuration setting that you can activate but have limited control over, planning is available as a separate chat mode/agent: I like this approach better than how Visual Studio does it as you have explicit control when plan mode is activated. Instead of immediately diving into execution, the plan agent creates a plan and asks some follow up questions: You can further edit the plan by clicking on ‘Open in Editor’: ...