Skip to main content

Microsoft Agent Framework- Workflow lifetime

While creating a workflow system with the Microsoft Agents SDK, I encountered the following error message when testing my workflow:

System.InvalidOperationException: Cannot use a Workflow that is
already owned by another runner or parent workflow.
   at Microsoft.Agents.AI.Workflows.Workflow.TakeOwnership(...)
   at InProcessRunnerContext..ctor(...)
   at InProcessRunner.CreateTopLevelRunner(...)

It's cryptic if you haven't seen it before. But once you understand the ownership model, the fix is straightforward.

What's actually happening

.NET workflow runtimes treat workflow instances as stateful, non-reentrant objects. When a runner picks up a workflow, it takes exclusive ownership of that instance. No other runner — and no parent workflow — is allowed to touch it while that ownership is held.

This is by design. Workflows accumulate state as they execute, and allowing two runners to share that state simultaneously would corrupt it. The runtime enforces ownership as a hard constraint, not a soft lock.

The problem almost always has the same root cause: the workflow is registered as a singleton in the dependency injection container, and two requests arrive before the first one finishes. Both try to take ownership of the same instance — and one of them loses.

Why singletons cause this

When we call AddWorkflow<MyWorkflow>(key) without specifying a lifetime, the SDK often defaults to a keyed singleton. That feels sensible — why spin up a new object every time? But "singleton" means every call shares the same instance. Under any real load, concurrent requests will collide.

The stack trace in our exception confirms this: InProcessRunner.CreateTopLevelRunner calls TakeOwnership, which checks whether the instance is already claimed. It is, so it throws.

Three ways to fix it

1. Create a new instance per run (the simplest fix)

The most direct solution is to stop reusing a single instance. Instead, resolve or construct a fresh workflow object each time a run begins.

Before — singleton:

After — transient:

2. Use a factory pattern

If your workflow needs runtime parameters or non-trivial construction logic, a factory keeps things clean:

3. Check ownership timeouts for persisted workflows

If you're using durable/persisted workflows (backed by a database or checkpoint store), the ownership model extends across process boundaries. A previous run may have claimed ownership and crashed without releasing it — leaving the instance locked until the ownership lease expires.

In that case, check two things: whether another process is still actively holding the workflow, and whether your ownership timeout is long enough for your longest-running step. If timeouts are too short, a slow step will lose its lease mid-execution, and the next runner will see the same collision.

Quick checklist

  • Change singleton registrations to transient for workflows that run concurrently
  • Never share a workflow instance across requests — one instance, one run
  • For persisted workflows, check that no other process holds ownership before retrying
  • Tune ownership lease duration to exceed your longest step's expected runtime
  • If sub-workflows are involved, ensure the parent passes the correct ownership signoff token

The ownership model isn't a bug — it's the runtime protecting you from state corruption under concurrency. Once you stop treating workflows like stateless services and give each run its own instance, the exception goes away and everything works as intended.

More information

Microsoft Agent Framework Workflows | Microsoft Learn

Popular posts from this blog

Podman– Command execution failed with exit code 125

After updating WSL on one of the developer machines, Podman failed to work. When we took a look through Podman Desktop, we noticed that Podman had stopped running and returned the following error message: Error: Command execution failed with exit code 125 Here are the steps we tried to fix the issue: We started by running podman info to get some extra details on what could be wrong: >podman info OS: windows/amd64 provider: wsl version: 5.3.1 Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM Error: unable to connect to Podman socket: failed to connect: dial tcp 127.0.0.1:2655: connectex: No connection could be made because the target machine actively refused it. That makes sense as the podman VM was not running. Let’s check the VM: >podman machine list NAME         ...

Azure DevOps/ GitHub emoji

I’m really bad at remembering emoji’s. So here is cheat sheet with all emoji’s that can be used in tools that support the github emoji markdown markup: All credits go to rcaviers who created this list.

Cleaner switch expressions with pattern matching in C#

Ever find yourself mapping multiple string values to the same result? Being a C# developer for a long time, I sometimes forget that the C# has evolved so I still dare to chain case labels or reach for a dictionary. Of course with pattern matching this is no longer necessary. With pattern matching, you can express things inline, declaratively, and with zero repetition. A small example I was working on a small script that should invoke different actions depending on the environment. As our developers were using different variations for the same environment e.g.  "tst" alongside "test" , "prd" alongside "prod" .  We asked to streamline this a long time ago, but as these things happen, we still see variations in the wild. This brought me to the following code that is a perfect example for pattern matching: The or keyword here is a logical pattern combinator , not a boolean operator. It matches if either of the specified pattern...