We detected a weird problem when running a parallel GroupBy on a system with high amount of cores.
We're running this on .Net Framework 4.7.2.
The (simplified) code:
public static void Main()
{
//int MAX_THREADS = Environment.ProcessorCount - 2;
//ThreadPool.SetMinThreads(1, 1);
//ThreadPool.SetMaxThreads(MAX_THREADS, MAX_THREADS);
var elements = new List<ElementInfo>();
for (int i = 0; i < 250000; i++)
elements.Add(new ElementInfo() { Name = "123", Description = "456" });
using (var cancellationTokenSrc = new CancellationTokenSource())
{
var cancellationToken = cancellationTokenSrc.Token;
var dummy = elements.AsParallel()
.WithCancellation(cancellationToken)
.Select(x => new { Name = x.Name })
.GroupBy(x => "abc")
.ToDictionary(g => g.Key, g => g.ToList());
}
}
public class ElementInfo
{
public string Name { get; set; }
public string Description { get; set; }
}
This code is running in an application that is already using about 100 threads. Running this on a "normal" pc (12 or 16 cores), it runs very fast (less than 1 second). Running this on a PC with a high amount of cores (48), it runs very slow (20 seconds).
Taking a dump during the 20 second delay, I see the threads running this LINQ are all waiting in HashRepartitionEnumerator.MoveNext().
There's a m_barrier.Wait(), so I think it is waiting there. It seems to wait on m_barrier, which is set to the number of partitions.
My guess is the following:
The number of partitions is set to the number of cores (48 in this case).
A number of threads are started in the thread pool, but the thread pool is full, so new threads need to be started. This happens at 1 thread per second.
While the threadpool is spinning up threads, all threads already running this LINQ query, are waiting until enough threads are started.
Only when enough threads are started, the LINQ query can finish.
Uncommenting the first lines in the Main method supports this thesis: By limiting the number of threads, the desired amount of threads is never reached, so this LINQ query never finishes.
Does this seem like a bug in .Net Framework, or am I doing something wrong?
Note: the real LINQ query has a few CPU-intensive Where-clauses, which makes it ideal to run in parallel. I removed this code as it isn't needed to reproduce the issue.
Does this seem like a bug in .NET Framework, or am I doing something wrong?
Yes, it does look like a bug, but actually this behavior is by design. The Task Parallel Library depends heavily on the ThreadPool
by default, and the ThreadPool
is not an incredibly clever piece of software. Which is both good and bad. It's good because its behavior is predictable, and it's bad because it behaves non-optimally when stressed. The algorithm that controls its behavior¹ is basically this:
ThreadPool.SetMinThreads
method, which
by default is equal to Environment.ProcessorCount
.This algorithm offers very few configuration options. For example you can't control the injection rate of new threads. So if the behavior of the built-in ThreadPool
doesn't fit your needs, you are in a tough situation. You could consider implementing your own ThreadPool
, in the form of a custom TaskScheduler
, but unfortunately the PLINQ library doesn't even allow to configure the scheduler. There is no public WithTaskScheduler
option available, analogous to the ParallelOptions.TaskScheduler
property that can be used with the Parallel
class (it's internal, due to fear of deadlocks).
Rewriting the PLINQ library from scratch on top of a custom ThreadPool
is presumably not a realistic option. So the best that you can really do is to ensure that the ThreadPool
has always enough threads to satisfy the demand (increase the ThreadPool.SetMinThreads
), specify explicitly the MaxDegreeOfParalellism
whenever you use paralellization, and be conservative regarding the degree of paralellism of each parallel operation. Definitely avoid nesting one parallel operation inside another, because this is the easiest way to saturate the ThreadPool
and cause it to misbehave.
¹ As of .NET 6. The behavior of the ThreadPool
could change in future .NET versions.