I try to create well-designed TPL dataflow pipeline with optimal using of system resources. My project is a HTML parser that adds parsed values into SQL Server DB. I already have all methods of my future pipeline, and now my question is what is the optimal way to place them in Dataflow blocks, and how much blocks i should use? Some of methods are CPU-bound, and some of them - I/O-bound(loading from Internet, SQL Server DB queries). For now I think that placing each I/O operation in separate block is the right way like on this scheme:
What are the basic rules of designing pipelines in that case?
One way to choose how to divide the blocks is to decide which parts you want to scale independently of the others. A good starting point is to divide the CPU-bound portions from the I/O-bound portions. I'd consider combining the last two blocks, since they are both I/O-bound (presumably to the same database).