Search code examples
c#tpl-dataflow

TPL DataFlow is being idle with no reason


Consider the following DataFlow pipeline on a 16 cores machine Extract - TransformBlock. Bound Capacity: 32, DOP: 16 Download - TransformBlock. Bound Capacity: 1,024, DOP: 64 Process - TransformBlock. Bound Capacity: 50,000, DOP: 16

This is the pipeline order of execution: Extract --> Download --> Process We've observed that from time to time our pipeline is "stuck" and not consuming every messages. We added some traces to check what is happening inside and indeed validated that this is the case. For example:

Timestamp                       BlockName   Input   Output  DOP Total
2022-02-24 17:16:21.0160704     Extract     0       32      0   32
2022-02-24 17:16:21.0160704     Download    0       921     1   922
2022-02-24 17:16:21.0160704     Process     0       0       1   1
2022-02-24 17:16:21.0785734     Extract     0       32      0   32
2022-02-24 17:16:21.0785734     Download    0       921     1   922
2022-02-24 17:16:21.0785734     Process     0       0       1   1
2022-02-24 17:16:21.1254470     Extract     0       32      0   32
2022-02-24 17:16:21.1254470     Download    0       1024    0   1024
2022-02-24 17:16:21.1254470     Process     0       0       1   1
2022-02-24 17:16:21.1723229     Extract     0       32      0   32
2022-02-24 17:16:21.1723229     Download    0       1024    0   1024
2022-02-24 17:16:21.1723229     Process     0       0       1   1
2022-02-24 17:16:21.2191997     Extract     0       32      0   32
2022-02-24 17:16:21.2191997     Download    0       1024    0   1024
2022-02-24 17:16:21.2191997     Process     0       0       1   1
2022-02-24 17:16:21.2660764     Extract     0       32      0   32
2022-02-24 17:16:21.2660764     Download    0       1024    0   1024
2022-02-24 17:16:21.2660764     Process     0       0       1   1
2022-02-24 17:16:21.3285760     Extract     0       32      0   32
2022-02-24 17:16:21.3285760     Download    0       1024    0   1024
2022-02-24 17:16:21.3285760     Process     0       0       1   1
2022-02-24 17:16:21.3754516     Extract     0       32      0   32
2022-02-24 17:16:21.3754516     Download    0       1024    0   1024
2022-02-24 17:16:21.3754516     Process     0       0       1   1
2022-02-24 17:16:21.4223896     Extract     0       29      0   29
2022-02-24 17:16:21.4223896     Download    0       992     15  1007
2022-02-24 17:16:21.4223896     Process     0       7       1   8

As you can see here Download moved its messages only at 2022-02-24 17:16:21.4223896 it was already "full" at 2022-02-24 17:16:21.1254470.

My question is what happened during these 297ms? Looking at the thread count on that particular time it wasn't high at all...


Solution

  • Judging from your other recent question, my guess is that the behavior of your pipeline is dominated by the behavior of the heavily saturated ThreadPool. The blocks are competing with each other for the few and slowly increasing number of available ThreadPool threads, making the MaxDegreeOfParallelism configuration of the blocks mostly irrelevant for approximately the first couple of minutes after the start of the application. Eventually the ThreadPool will grow enough to satisfy the demand, but with an injection rate of only one new thread every second, this will take a while. Since your application makes so heavy use of the ThreadPool, it might be a good idea to use the ThreadPool.SetMinThreads method at the start of the program, and configure the minimum number of threads the ThreadPool will create instantly on demand before switching to the slow algorithm.

    Alternatively you could consider converting your synchronous work to asynchronous, if this is possible (if asynchronous APIs are available for whatever you are doing), in order to minimize the number of required threads.