Search code examples
c#.nettask-parallel-librarytpl-dataflow

Wait for previous blocks to finish processing before continuing


I have a process which looks like this.

  1. Get a set of CSV files from a folder
  2. Read the CSV files, and store the contents in a database
  3. Read the data from the database and perform some more processing.

The reason for the separation of steps 2 & 3 is to separate issues involved with reading the files from issues involved with processing the files.

I can model this with three dataflow blocks. The problem I have is that I don't want block 3 to start until all files have been persisted to the database. I need some way of determining that all the files that were picked up in block 1 have been processed by block 2. Block 2 will have its MaxDegreeOfParallelism set to Unbounded - I want them processed in parallel.

I considered using Encapsulate on the first two blocks, but I don't think that would work. Perhaps I need some kind of Batchblock, but the batches are not all going to be the same size.

How can I do this? Do I need to create my own block type?


Solution

  • This doesn't fit a single TDF flow since step #2 doesn't pass items to step #3 which starts after the previous ones already completed.

    You should have 2 separate flows. The first reads from the folder and stores in the database and the second reads from the database and starts processing. You can wait for the first flow to complete by awaiting the Completion property:

    var reader = // Create #1 block
    var dbFiller = // Create #2 block
    
    reader.LinkTo(dbFiller, new DataflowLinkOptions { PropagateCompletion = true }); // Link both blocks with Completion Propagation
    
    reader.Post( // Queue up work for reader
    
    await reader.Completion; // Asynchronously wait for previous steps to complete
    
    var processor = // Create #3 block
    
    processor.Post( // Queue up work for processor