In the previous post, I showed how to use LinkTo predicates to route messages conditionally across different blocks. Today, we're going to take that concept a step further and do something that surprises most developers the first time they see it:
Link a block back to itself to create recursion — entirely through the dataflow graph, with no explicit recursive method calls.
The core idea
Traditional recursion involves a function calling itself. In TPL Dataflow, we achieve the same result structurally: a block's output is linked back to its own input via a predicate. Messages that match the "recurse" condition loop back, while messages that match the "base case" condition flow forward. The dataflow runtime handles the iteration for us.
Sounds complicated? An example will make it clear immediately.
A good example to illustrate this walks through a directory tree and computing MD5 hashes for every file in a directory. Directories need to be expanded (recursed into), while files need to be processed (hashed).
Here's the dataflow graph we're going to build:
Remark: Notice that the TransformManyBlock is the heart of the recursion. When it receives a folder path, it returns all entries inside that folder. Each entry is then routed by predicate: directories loop back into the same block, and files move forward to be hashed.
Step 1: Define the blocks
Step 2: Wire up the recursive link
This is where the magic happens. We link getFolderContents back to itself with a predicate that matches directories:
The order of these two LinkTo calls matters. TPL Dataflow evaluates predicates in the order the links were created. By placing the directory check first, we ensure subdirectories are always caught before the file check is evaluated.
Step 3: Kick it
We post a single root folder path and call Complete(). The recursion unwinds naturally: as subdirectories are expanded and eventually all entries resolve to files, no new messages loop back, and the block drains on its own.
How the recursion actually terminates
This is the part that confuses people. There is no explicit base-case check or recursion depth limit. Termination relies on two things:
First, the structure of the data guarantees it. A file system is a finite tree. Every directory contains a finite set of entries, and eventually every path resolves to a file (or an empty directory that produces zero output from TransformManyBlock). So the loop-back link naturally stops receiving new messages.
Second, TransformManyBlock returns an empty enumerable for empty directories. When a block produces no output, nothing is posted back, so no new work is generated on that branch. This is the dataflow equivalent of a base case returning immediately.
Why not just use a recursive method?
You might wonder why we'd go through all this instead of a simple recursive method. Here's where the dataflow approach shines:
Parallelism is automatic. Each TransformBlock and TransformManyBlock can process multiple messages concurrently. While one directory is being expanded, other directories at the same level can be expanded simultaneously, and files can already be flowing into the hash computation stage. A naive recursive method processes everything sequentially unless you manually manage threads.
Backpressure is built in. If the hash computation stage can't keep up, the upstream blocks will naturally slow down. You don't need to implement throttling or semaphores yourself.
The pipeline stays flat. No matter how deeply nested the directory tree is, your code doesn't grow a deeper call stack. Each iteration is just another message posted to the block's internal buffer.