In the previous post we went deep on sessions — how to create, persist, resume, and manage them in .NET. All of that assumes you have a running application talking to a Copilot CLI. In development, that's trivial: the SDK starts the CLI for you automatically. In production, the picture is more complex.
This post is about what happens between "it works on my machine" and "it's serving real users." We'll look at how the CLI architecture actually works, when to run the CLI as a separate headless server, the isolation patterns that fit different application types, and how to scale horizontally without losing session state.
How the SDK talks to the CLI
Before making deployment decisions, it helps to understand the communication model. Every SDK in every language works the same way underneath:
Your Application
↓
SDK Client
↓
JSON-RPC
↓
Copilot CLI (server mode)
All SDKs communicate with the Copilot CLI server via JSON-RPC. The CLI is doing the heavy lifting — model routing, tool execution, context management, streaming. The SDK client is a typed interface to that process.
By default, the SDK spawns the CLI as a child process. This is the default behavior and is suitable for standalone tools or scenarios where each application instance manages its own CLI lifecycle. This is the mode we've been using throughout this series — you call StartAsync() and the SDK handles everything.
For production backends, you typically want to break those two things apart.
Mode 1: Auto-managed CLI (default)
The bundled, auto-managed CLI is the right choice for:
- Desktop applications and distributable utilities
- Developer tools that run on a single machine
- Scripts and local automation
- Early-stage prototypes
The .NET SDK uses MSBuild targets to automatically download the correct platform-specific CLI binary during the build process, mapping .NET Runtime Identifiers to Copilot platform names and fetching the binary for you. The output binary is placed in the runtimes/{rid}/native folder.
This means zero manual installation — dotnet add package GitHub.Copilot.SDK is all you need, and the right CLI binary for your platform is bundled automatically.
// The CLI starts and stops with your app — no separate process to manage
await using var client = new CopilotClient();
await client.StartAsync();
// ... your application logic ...
await client.StopAsync();
The trade-off is that the CLI lifecycle is tied to your application process. Every instance of your application runs its own CLI. For single-user tools, this is fine. For multi-instance web APIs, you're running N CLI processes for N replicas — which is wasteful and creates session continuity problems.
Remark: This also works for the other available languages, with the exception of Go, where you need to download the CLI yourselves.
Mode 2: Headless CLI server
For backend services — web APIs, background workers, internal tools, anything running on a server — you decouple the CLI from your application by running it in headless server mode.
Instead of the SDK spawning a CLI child process, you run the CLI independently in headless server mode. Your backend connects to it over TCP using the cliUrl option. The CLI runs as a persistent server process, not spawned per request. Multiple SDK clients can share one CLI server.
Start the CLI in headless mode:
# Fixed port
copilot --headless --port 4321
# Or let it pick a random port (prints the URL on startup)
copilot --headless
# Output: Listening on http://localhost:52431
Note: By default the headless server only accepts connections from loopback (127.0.0.1). To accept connections from other hosts — for example from another machine on your network — bind to a non-loopback address with
--host.
Then connect your application to it:
var client = new CopilotClient(new CopilotClientOptions
{
CliUrl = "localhost:4321"
});
// No StartAsync needed — CLI is already running
await client.ConnectAsync();
Running as a Docker Container
For production, run the CLI as a system service or in a container. The simplest Docker Compose setup — one CLI container, one API container, shared session storage:
version: "3.8"
services:
copilot-cli:
image: ghcr.io/github/copilot-cli:latest
command: ["--headless", "--port", "4321"]
environment:
- COPILOT_GITHUB_TOKEN=${COPILOT_GITHUB_TOKEN}
ports:
- "4321:4321"
restart: always
volumes:
- session-data:/root/.copilot/session-state
api:
build: .
environment:
- CLI_URL=copilot-cli:4321
depends_on:
- copilot-cli
ports:
- "8080:8080"
volumes:
session-data:
Your .NET API reads CLI_URL from the environment and connects:
var cliUrl = Environment.GetEnvironmentVariable("CLI_URL") ?? "localhost:4321";
builder.Services.AddSingleton(sp => new CopilotClient(new CopilotClientOptions
{
CliUrl = cliUrl
}));
Isolation patterns
Once you're running a headless CLI, you have a choice to make: how much isolation do users get from each other? The official docs frame this as three distinct patterns, each with different resource costs and security boundaries.
Pattern 1: Shared CLI, isolated sessions
Multiple users share one CLI server. Each user gets their own session with a unique SessionId. The CLI process is shared; the conversation context is not.
// Each user gets a session scoped to their identity
var session = await _client.CreateSessionAsync(new SessionConfig
{
SessionId = $"user-{userId}-{conversationId}",
Model = "gpt-4.1"
});
Pros: Lightest on resources. Simple to operate. One CLI to monitor.
Cons: Weakest isolation boundary. All users share the same process memory and file system access context. If one session triggers a crash, it affects everyone.
Best for: Internal tools, developer portals, trusted environments where all users are within the same organization.
Pattern 2: CLI pool, one CLI per user
Each user gets their own dedicated CLI server instance. Sessions for that user always route to their CLI.
public class CliPool
{
private readonly Dictionary<string, CopilotClient> _clients = new();
private int _nextPort = 5000;
public async Task<CopilotClient> GetClientForUserAsync(string userId)
{
if (_clients.TryGetValue(userId, out var existing))
return existing;
var port = _nextPort++;
await SpawnCliAsync(port);
var client = new CopilotClient(new CopilotClientOptions
{
CliUrl = $"localhost:{port}"
});
await client.ConnectAsync();
_clients[userId] = client;
return client;
}
public async Task ReleaseUserAsync(string userId)
{
if (_clients.TryGetValue(userId, out var client))
{
await client.DisposeAsync();
_clients.Remove(userId);
}
}
}
Pros: Strong isolation — a user's sessions, memory, and processes are completely separated. Users can authenticate with different GitHub tokens.
Cons: Higher resource cost. N active users = N CLI processes. Requires process lifecycle management.
Best for: Multi-tenant SaaS products where data isolation is a hard requirement. Users with different authentication credentials.
Pattern 3: Shared CLI with namespaced Session IDs (Lightweight Middle Ground)
A pragmatic middle option: one shared CLI, but session IDs are structured to encode ownership so auditing and cleanup are straightforward, and sessions never accidentally cross user boundaries.
// Session ID encodes tenant, user, and purpose
string CreateSessionId(string tenantId, string userId, string taskType)
=> $"{tenantId}-{userId}-{taskType}-{DateTimeOffset.UtcNow.ToUnixTimeSeconds()}";
// alice-acme-code-review-1714900000
var sessionId = CreateSessionId("acme", "alice", "code-review");
This doesn't give you process-level isolation, but it does give you clean logical separation and makes audit logs and session cleanup tractable.
Scaling horizontally
A single headless CLI can handle many concurrent sessions. But once your load grows beyond what one instance can handle, you need multiple CLI replicas — and that means thinking carefully about session state.
The core constraint: shared storage enables any CLI to handle any session. Load distribution is more even, but requires networked storage for ~/.copilot/session-state/.
Without shared storage, a session created on replica A can only be resumed by replica A. Add a load balancer and traffic can land anywhere — which breaks resumability unless you use sticky sessions (fragile) or shared state (correct).
Kubernetes deployment
The following deploys three CLI replicas sharing a PersistentVolumeClaim so that any replica can resume any session:
apiVersion: apps/v1
kind: Deployment
metadata:
name: copilot-cli
spec:
replicas: 3
selector:
matchLabels:
app: copilot-cli
template:
metadata:
labels:
app: copilot-cli
spec:
containers:
- name: copilot-cli
image: ghcr.io/github/copilot-cli:latest
args: ["--headless", "--port", "4321"]
env:
- name: COPILOT_GITHUB_TOKEN
valueFrom:
secretKeyRef:
name: copilot-secrets
key: github-token
ports:
- containerPort: 4321
volumeMounts:
- name: session-state
mountPath: /root/.copilot/session-state
volumes:
- name: session-state
persistentVolumeClaim:
claimName: copilot-sessions-pvc
---
apiVersion: v1
kind: Service
metadata:
name: copilot-cli
spec:
selector:
app: copilot-cli
ports:
- port: 4321
targetPort: 4321
Your .NET deployment then points at the copilot-cli service:
apiVersion: apps/v1
kind: Deployment
metadata:
name: copilot-api
spec:
replicas: 3
template:
spec:
containers:
- name: api
image: your-registry/copilot-api:latest
env:
- name: CLI_URL
value: "copilot-cli:4321"
The PVC gives every CLI replica access to the same session state directory. Any replica can resume any session. The load balancer distributes both CLI and API traffic freely.
Important: For the PVC approach to work, the volume must support
ReadWriteManyaccess mode — NFS, Azure Files, or a cloud-native equivalent. Standard block storage (AWS EBS, Azure Disk) isReadWriteOnceand won't work for shared access across pods.
What's next
I already covered a lot about deployment but there are some related topics I still want to explore. So in the next post, we'll take a look at authentication and observability, key elements for any production-ready setup.