Search code examples
c#.netmultithreadinglinq

Parallel LINQ GroupBy taking long time on systems with high amount of cores


We detected a weird problem when running a parallel GroupBy on a system with high amount of cores.

We're running this on .Net Framework 4.7.2.

The (simplified) code:

    public static void Main()
    {
    //int MAX_THREADS = Environment.ProcessorCount - 2;
    //ThreadPool.SetMinThreads(1, 1);
    //ThreadPool.SetMaxThreads(MAX_THREADS, MAX_THREADS);

    var elements = new List<ElementInfo>();
    for (int i = 0; i < 250000; i++)
        elements.Add(new ElementInfo() { Name = "123", Description = "456" });

    using (var cancellationTokenSrc = new CancellationTokenSource())
    {
        var cancellationToken = cancellationTokenSrc.Token;
        var dummy = elements.AsParallel()
            .WithCancellation(cancellationToken)
            .Select(x => new { Name = x.Name })
            .GroupBy(x => "abc")
            .ToDictionary(g => g.Key, g => g.ToList());
    }
}
public class ElementInfo
{
    public string Name { get; set; }
    public string Description { get; set; }
}

This code is running in an application that is already using about 100 threads. Running this on a "normal" pc (12 or 16 cores), it runs very fast (less than 1 second). Running this on a PC with a high amount of cores (48), it runs very slow (20 seconds).

Taking a dump during the 20 second delay, I see the threads running this LINQ are all waiting in HashRepartitionEnumerator.MoveNext().

Call stack of threads waiting

There's a m_barrier.Wait(), so I think it is waiting there. It seems to wait on m_barrier, which is set to the number of partitions.

My guess is the following:

  • The number of partitions is set to the number of cores (48 in this case).

  • A number of threads are started in the thread pool, but the thread pool is full, so new threads need to be started. This happens at 1 thread per second.

  • While the threadpool is spinning up threads, all threads already running this LINQ query, are waiting until enough threads are started.

  • Only when enough threads are started, the LINQ query can finish.

Uncommenting the first lines in the Main method supports this thesis: By limiting the number of threads, the desired amount of threads is never reached, so this LINQ query never finishes.

Does this seem like a bug in .Net Framework, or am I doing something wrong?

Note: the real LINQ query has a few CPU-intensive Where-clauses, which makes it ideal to run in parallel. I removed this code as it isn't needed to reproduce the issue.


Solution

  • Does this seem like a bug in .NET Framework, or am I doing something wrong?

    Yes, it does look like a bug, but actually this behavior is by design. The Task Parallel Library depends heavily on the ThreadPool by default, and the ThreadPool is not an incredibly clever piece of software. Which is both good and bad. It's good because its behavior is predictable, and it's bad because it behaves non-optimally when stressed. The algorithm that controls its behavior¹ is basically this:

    1. Satisfy instantly all demands for work until the number of the worker threads reaches the number specified by the ThreadPool.SetMinThreads method, which by default is equal to Environment.ProcessorCount.
    2. If the demand for work cannot be satisfied by the available workers, inject more threads in the pool with a frequency of one new thread per second.

    This algorithm offers very few configuration options. For example you can't control the injection rate of new threads. So if the behavior of the built-in ThreadPool doesn't fit your needs, you are in a tough situation. You could consider implementing your own ThreadPool, in the form of a custom TaskScheduler, but unfortunately the PLINQ library doesn't even allow to configure the scheduler. There is no public WithTaskScheduler option available, analogous to the ParallelOptions.TaskScheduler property that can be used with the Parallel class (it's internal, due to fear of deadlocks).

    Rewriting the PLINQ library from scratch on top of a custom ThreadPool is presumably not a realistic option. So the best that you can really do is to ensure that the ThreadPool has always enough threads to satisfy the demand (increase the ThreadPool.SetMinThreads), specify explicitly the MaxDegreeOfParalellism whenever you use paralellization, and be conservative regarding the degree of paralellism of each parallel operation. Definitely avoid nesting one parallel operation inside another, because this is the easiest way to saturate the ThreadPool and cause it to misbehave.

    ¹ As of .NET 6. The behavior of the ThreadPool could change in future .NET versions.