Skip to main content

Creating recursion in TPL Dataflow with LinkTo predicates

In the previous post, I showed how to use LinkTo predicates to route messages conditionally across different blocks. Today, we're going to take that concept a step further and do something that surprises most developers the first time they see it:

Link a block back to itself to create recursion — entirely through the dataflow graph, with no explicit recursive method calls.

The core idea

Traditional recursion involves a function calling itself. In TPL Dataflow, we achieve the same result structurally: a block's output is linked back to its own input via a predicate. Messages that match the "recurse" condition loop back, while messages that match the "base case" condition flow forward. The dataflow runtime handles the iteration for us.

Sounds complicated? An example will make it clear immediately.

A good example to illustrate this walks through a directory tree and computing MD5 hashes for every file in a directory. Directories need to be expanded (recursed into), while files need to be processed (hashed).

Here's the dataflow graph we're going to build:


Remark: Notice that the TransformManyBlock is the heart of the recursion. When it receives a folder path, it returns all entries inside that folder. Each entry is then routed by predicate: directories loop back into the same block, and files move forward to be hashed.

Step 1: Define the blocks

Step 2: Wire up the recursive link

This is where the magic happens. We link getFolderContents back to itself with a predicate that matches directories:

The order of these two LinkTo calls matters. TPL Dataflow evaluates predicates in the order the links were created. By placing the directory check first, we ensure subdirectories are always caught before the file check is evaluated.

Step 3: Kick it

We post a single root folder path and call Complete(). The recursion unwinds naturally: as subdirectories are expanded and eventually all entries resolve to files, no new messages loop back, and the block drains on its own.

How the recursion actually terminates

This is the part that confuses people. There is no explicit base-case check or recursion depth limit. Termination relies on two things:

First, the structure of the data guarantees it. A file system is a finite tree. Every directory contains a finite set of entries, and eventually every path resolves to a file (or an empty directory that produces zero output from TransformManyBlock). So the loop-back link naturally stops receiving new messages.

Second, TransformManyBlock returns an empty enumerable for empty directories. When a block produces no output, nothing is posted back, so no new work is generated on that branch. This is the dataflow equivalent of a base case returning immediately.

Why not just use a recursive method?

You might wonder why we'd go through all this instead of a simple recursive method. Here's where the dataflow approach shines:

Parallelism is automatic. Each TransformBlock and TransformManyBlock can process multiple messages concurrently. While one directory is being expanded, other directories at the same level can be expanded simultaneously, and files can already be flowing into the hash computation stage. A naive recursive method processes everything sequentially unless you manually manage threads.

Backpressure is built in. If the hash computation stage can't keep up, the upstream blocks will naturally slow down. You don't need to implement throttling or semaphores yourself.

The pipeline stays flat. No matter how deeply nested the directory tree is, your code doesn't grow a deeper call stack. Each iteration is just another message posted to the block's internal buffer.

Popular posts from this blog

Kubernetes–Limit your environmental impact

Reducing the carbon footprint and CO2 emission of our (cloud) workloads, is a responsibility of all of us. If you are running a Kubernetes cluster, have a look at Kube-Green . kube-green is a simple Kubernetes operator that automatically shuts down (some of) your pods when you don't need them. A single pod produces about 11 Kg CO2eq per year( here the calculation). Reason enough to give it a try! Installing kube-green in your cluster The easiest way to install the operator in your cluster is through kubectl. We first need to install a cert-manager: kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.5/cert-manager.yaml Remark: Wait a minute before you continue as it can take some time before the cert-manager is up & running inside your cluster. Now we can install the kube-green operator: kubectl apply -f https://github.com/kube-green/kube-green/releases/latest/download/kube-green.yaml Now in the namespace where we want t...

Azure DevOps/ GitHub emoji

I’m really bad at remembering emoji’s. So here is cheat sheet with all emoji’s that can be used in tools that support the github emoji markdown markup: All credits go to rcaviers who created this list.

Podman– Command execution failed with exit code 125

After updating WSL on one of the developer machines, Podman failed to work. When we took a look through Podman Desktop, we noticed that Podman had stopped running and returned the following error message: Error: Command execution failed with exit code 125 Here are the steps we tried to fix the issue: We started by running podman info to get some extra details on what could be wrong: >podman info OS: windows/amd64 provider: wsl version: 5.3.1 Cannot connect to Podman. Please verify your connection to the Linux system using `podman system connection list`, or try `podman machine init` and `podman machine start` to manage a new Linux VM Error: unable to connect to Podman socket: failed to connect: dial tcp 127.0.0.1:2655: connectex: No connection could be made because the target machine actively refused it. That makes sense as the podman VM was not running. Let’s check the VM: >podman machine list NAME         ...