In the previous post I set the stage on deploying to production. We covered managing the CLI process, different isolation patterns and how to scale horizontally.
This post covers 2 other aspects important when putting your GitHub Copilot SDK enabled application into production; how to tackle authentication and how to get insights into what is going inside your agentic system.
Authentication in production
Development uses your personal GitHub credentials via gh auth login. Production backends need a different approach.
Service account token (shared CLI): Set COPILOT_GITHUB_TOKEN as an environment variable on the CLI process. All sessions on that CLI use the same token. Simple, but every user is acting as the service account.
export COPILOT_GITHUB_TOKEN="gho_service_account_token"
copilot --headless --port 4321
Per-user tokens (GitHub OAuth): For multi-tenant applications where users authenticate with their own GitHub identities, pass each user's token when creating their session. This requires implementing the GitHub OAuth flow in your application and is covered in depth in the GitHub OAuth with Copilot SDK docs.
BYOK: If you'd rather not tie deployment to GitHub auth at all, configure the SDK to use your own API keys from OpenAI, Azure AI Foundry, or Anthropic. This sidesteps the Copilot subscription requirement and premium request quota entirely — useful for automated pipelines where per-request billing against a known provider is preferable.
var client = new CopilotClient(new CopilotClientOptions
{
Byok = new ByokConfig
{
Provider = "azure",
ApiKey = Environment.GetEnvironmentVariable("AZURE_API_KEY")!,
BaseUrl = Environment.GetEnvironmentVariable("AZURE_ENDPOINT")!
}
});
Note: BYOK uses key-based authentication only. Microsoft Entra ID (Azure AD), managed identities, and third-party identity providers are not supported (at the moment of writing this post).
Observability
Agents are harder to observe than traditional services because the interesting work happens inside the execution loop. Build instrumentation in from the start.
Session-Level metrics
Use the onSessionStart and onSessionEnd hooks to capture duration and end reason for every session:
var sessionStartTimes = new ConcurrentDictionary<string, long>();
var session = await client.CreateSessionAsync(new SessionConfig
{
Model = "gpt-4.1",
Hooks = new SessionHooks
{
OnSessionStart = async (input, invocation) =>
{
sessionStartTimes[invocation.SessionId] = input.Timestamp;
_metrics.SessionStarted(invocation.SessionId, input.Source);
return null;
},
OnSessionEnd = async (input, invocation) =>
{
if (sessionStartTimes.TryRemove(invocation.SessionId, out var startTime))
{
var duration = input.Timestamp - startTime;
_metrics.SessionEnded(invocation.SessionId, duration, input.Reason);
}
return null;
}
}
});
Tool call tracing
Instrument tool execution events to understand what actions the agent is taking in production — and catch unexpected tool usage early:
session.On(evt =>
{
switch(evt)
{
case ToolExecutionStartEvent toolStart:
_logger.LogInformation(
"Tool {Tool} invoked in session {Session}",
toolStart.Data.ToolName, sessionId));
break;
case ToolExecutionCompletedEvent toolCompleted: _logger.LogInformation(
"Tool {Tool} completed in {Duration}ms",
toolCompleted.Data.ToolName, toolCompleted.Data.DurationMs));
break;
}
});
Context compaction
Context compaction is automatic and mostly invisible, but it's worth logging when it happens. Frequent compaction on short conversations can indicate that your system prompt or injected context is consuming more of the context window than you intend./p>
Built-in OpenTelemetry support
The event-based instrumentation above is useful for quick feedback and custom metrics, but for production observability you want proper distributed traces — spans that link the CLI's internal execution to your application code, visible in Jaeger, Grafana, Azure Monitor, Datadog, or any OTLP-compatible backend.
The SDK ships with this built-in. OpenTelemetry support is a first-class feature of the Copilot SDK, not an afterthought: it provides built-in distributed tracing with W3C trace context propagation across all SDKs.
Opting in: one line of config
Enabling OTel tracing requires a single addition to your CopilotClientOptions:
var client = new CopilotClient(new CopilotClientOptions
{
Telemetry = new TelemetryConfig
{
OtlpEndpoint = "http://localhost:4318"
}
});
That's it. The SDK configures OpenTelemetry on the CLI process and begins exporting spans to your OTLP endpoint. No additional packages required beyond what the SDK already brings in.
Protocol note: The CLI runtime only supports OTLP over HTTP (
otlp-http). If your collector is configured for gRPC, the CLI will still use HTTP. Backends that serve both protocols on the same port — like the .NET Aspire Dashboard — work transparently.
What gets traced
Once enabled, every agent interaction produces a hierarchical span tree that captures the full execution flow:
invoke_agent [~15s]
├── chat gpt-4.1 [~3s] ← LLM requests tool calls
├── execute_tool readFile [~50ms]
├── execute_tool runCommand [~2s]
├── chat gpt-4.1 [~4s] ← LLM generates final response
└── (span ends)
Each span captures metadata following the OpenTelemetry GenAI Semantic Conventions — model names, token counts, durations — so the data works with any OTel-compatible backend. By default, no prompt content, responses, or tool arguments are captured: only metadata. If you need full content for debugging, set the COPILOT_OTEL_CAPTURE_CONTENT=true environment variable on the CLI process.
Seeing traces locally: The Aspire dashboard
For local development, the .NET Aspire Dashboard gives you a full trace viewer with a built-in OTLP endpoint — no cloud account needed, no Jaeger to configure:
docker run --rm -d \
-p 18888:18888 \
-p 4317:18889 \
--name aspire-dashboard \
mcr.microsoft.com/dotnet/aspire-dashboard:latest
The dashboard UI is on port 18888. Point your OtlpEndpoint at http://localhost:4318 (the SDK uses HTTP) and open http://localhost:18888 to see traces appear in real time as you send prompts.
Sending to production backends
For production, route traces through an OTel Collector and on to whichever backend you use. A minimal otel-collector-config.yaml that accepts OTLP and exports to multiple destinations:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
exporters:
azuremonitor:
connection_string: "${APPLICATIONINSIGHTS_CONNECTION_STRING}"
otlp/grafana:
endpoint: "${GRAFANA_OTLP_ENDPOINT}"
headers:
authorization: "Basic ${GRAFANA_AUTH}"
service:
pipelines:
traces:
receivers: [otlp]
exporters: [azuremonitor, otlp/grafana]
Deployment checklist
Let me summarize the current and previous post with a handy pre-deployment checklist:
Architecture
- Using headless CLI mode rather than the default auto-managed subprocess?
- CLI running as a persistent service (systemd, container, or Kubernetes pod)?
CLI_URLcoming from environment configuration, not hardcoded?
Session state
- Running multiple CLI replicas? If so, is session state on shared (
ReadWriteMany) storage? - Session IDs structured to encode ownership for auditability?
- Concurrency limit enforced to prevent memory exhaustion?
Authentication
- Service account token stored as an environment secret, not in source?
- If using per-user auth, GitHub OAuth flow implemented and tested?
- If using BYOK, provider credentials rotated on a schedule?
Observability
- Session start/end metrics captured?
- Tool execution events logged?
- Session errors surfaced to your alerting system?
- Context compaction events monitored?
Resilience
- CLI process managed by a supervisor (systemd, Kubernetes) that restarts on failure?
- Application handling
SessionErrorEventgracefully? - 30-minute idle timeout accounted for in long-running workflows?
What's next
Your application is now deployable, scalable, and observable. In the next post in the series, we cover MCP integration — connecting your agent to external context and services via the Model Context Protocol, so your agent can reach databases, APIs, and cloud services without you having to build the glue.