I have a process which looks like this.
The reason for the separation of steps 2 & 3 is to separate issues involved with reading the files from issues involved with processing the files.
I can model this with three dataflow blocks. The problem I have is that I don't want block 3 to start until all files have been persisted to the database. I need some way of determining that all the files that were picked up in block 1 have been processed by block 2. Block 2 will have its MaxDegreeOfParallelism
set to Unbounded
- I want them processed in parallel.
I considered using Encapsulate
on the first two blocks, but I don't think that would work. Perhaps I need some kind of Batchblock
, but the batches are not all going to be the same size.
How can I do this? Do I need to create my own block type?
This doesn't fit a single TDF flow since step #2 doesn't pass items to step #3 which starts after the previous ones already completed.
You should have 2 separate flows. The first reads from the folder and stores in the database and the second reads from the database and starts processing. You can wait for the first flow to complete by awaiting the Completion
property:
var reader = // Create #1 block
var dbFiller = // Create #2 block
reader.LinkTo(dbFiller, new DataflowLinkOptions { PropagateCompletion = true }); // Link both blocks with Completion Propagation
reader.Post( // Queue up work for reader
await reader.Completion; // Asynchronously wait for previous steps to complete
var processor = // Create #3 block
processor.Post( // Queue up work for processor