c#.net task-parallel-library tpl-dataflow

Wait for previous blocks to finish processing before continuing

I have a process which looks like this.

Get a set of CSV files from a folder
Read the CSV files, and store the contents in a database
Read the data from the database and perform some more processing.

The reason for the separation of steps 2 & 3 is to separate issues involved with reading the files from issues involved with processing the files.

I can model this with three dataflow blocks. The problem I have is that I don't want block 3 to start until all files have been persisted to the database. I need some way of determining that all the files that were picked up in block 1 have been processed by block 2. Block 2 will have its MaxDegreeOfParallelism set to Unbounded - I want them processed in parallel.

I considered using Encapsulate on the first two blocks, but I don't think that would work. Perhaps I need some kind of Batchblock, but the batches are not all going to be the same size.

How can I do this? Do I need to create my own block type?

Solution

This doesn't fit a single TDF flow since step #2 doesn't pass items to step #3 which starts after the previous ones already completed.

You should have 2 separate flows. The first reads from the folder and stores in the database and the second reads from the database and starts processing. You can wait for the first flow to complete by awaiting the Completion property:

var reader = // Create #1 block
var dbFiller = // Create #2 block

reader.LinkTo(dbFiller, new DataflowLinkOptions { PropagateCompletion = true }); // Link both blocks with Completion Propagation

reader.Post( // Queue up work for reader

await reader.Completion; // Asynchronously wait for previous steps to complete

var processor = // Create #3 block

processor.Post( // Queue up work for processor