Reasons for Parallelism In One Block vs Multiple Single Threaded Target Blocks in Task Parallel Dataflow

What would be a reason for or difference between creating n number of ITargetBlock<T> workers without parallelism vs one ITargetBlock<T> worker with MaxDegreeOfParallelism n?

               |-- TransformBlock<T, TOut> BoundedCapacity=1  --|
               |-- TransformBlock<T, TOut> BoundedCapacity=1  --|
BufferBlock<T> |                                                |-- ActionBlock<TOut>
               |-- TransformBlock<T, TOut> BoundedCapacity=1  --|
               |-- TransformBlock<T, TOut> BoundedCapacity=1  --|

BufferBlock<T>  --  TransformBlock<T, TOut> MaxDegreeOfParallelism=4  --  ActionBlock<TOut>

Assuming that the worker block performs long running or I/O bound work, there are plenty of physical processor cores to share, and the order of results in which TOut are produced does not matter.

Solution

That former is less efficient and potentially more allocations depending how you would do this.
You would have to control max parallelism from a custom scheduler or some other synchronisation approach
You cannot take advantage of the EnsureOrdered
The overall pipeline has more complexity and has more code to boiler plate
It is slightly harder to debug (IMO).

I would just use 1 standard block per 1 standard concern unless there was a compelling need to do otherwise, at which point i would look closely at the benefits a custom block

_{Note : This answer has glossed over a lot of points (pros and cons) and lacks specific details that might be relevant to your solution which can't be known with the information supplied}

_{Note 2 : All in all there isn't much difference if all you are doing is running as much work as you can in parallel (apart from the points supplied)}