Search code examples
c#multithreadingasynchronousconcurrencyparallel-processing

ParallelOptions 'MaxDegreeOfParallelism' property - why can I specify more threads than my hardware has?


I have a number of questions regarding the operation of a Parallel.ForEach loop, especially in regards to the setting of theParallelOptions.MaxDegreeOfParallelism property.

My computer's CPU is Quad core featuring 8 logical processors.

To me the following should be the maximum number of possible process that can be performed in work in parallel as after all, I have 8 threads available:

            ParallelOptions parallelOptions = new()
            {
                MaxDegreeOfParallelism = Environment.ProcessorCount //8,
            };

Consider the following code. This simply iterates around a list of 300 uri's and does "something" with the responses:

            List<string> uriList = new List<string>();

            for (int i = 0; i < 300; i++)
            {
                uriList.Add(uri);
            }

            HttpClient httpClient = new HttpClient();

            ParallelOptions parallelOptions = new()
            {
                MaxDegreeOfParallelism = uriList.Count(),
            };

            await Parallel.ForEachAsync(uriList, parallelOptions, async (uri, token) =>
            {
                var response = await httpClient.GetStringAsync(uri);

                if (response != null)
                {
                    ProcessResponse(response);
                }
            });

Note that the MaxDegreeOfParallelism property is set to the size of the uriList i.e 300 rather than the thread count available to me from my physical hardware of 8. This code works and I'm lost to why setting the MaxDegreeOfParallelism property that high "works".

Questions:

  1. The MaxDegreeOfParallelism property can be specified to any number but the maximum amount of concurrent operations will only ever be as high as the hardware's available thread count?

  2. The MaxDegreeOfParallelism property can be though of as setting how many parallel "batches" of work will be carried out concurrently? For example iterating round a list of 16 items with a MaxDegreeOfParallelism set to '8' will causes two batches of '8' concurrent calls?

  3. If the MaxDegreeOfParallelism property isn't set then by default the maximum number of threads avaliable will be set?

  4. In the situation of an asynchronous Parallel.Foreach do procedding requests wait until the previous "batched" calls have returned? Or, when an await is encountered and a thread is "freed" can another item in the uriList begin its logical steps in the loop?

  5. Is there a "sweet spot" for the setting of the MaxDegreeOfParallelism property?


Solution

  • Your code works as desired, because I/O-bound asynchronous operations for the most part don't use threads.

    The Parallel family of methods schedule work on the ThreadPool by default. You can customize where the work is scheduled by providing a custom TaskScheduler, but this is rarely necessary. In most cases using the ThreadPool is OK. The ThreadPool can create a number of threads much larger than the number of the processors/cores. If need to, it can create thousands of threads¹. All these threads will share the available processors/cores of the machine. If there are more active (non sleeping) threads than cores, the operating system will split the available processing power to all threads, so all threads will get their time slices and make progress.

    The MaxDegreeOfParallelism doesn't work with batches. When the processing of an element completes, immediately starts the processing of another element. It doesn't wait for the previous batch to complete, before starting the next batch.

    The default MaxDegreeOfParallelism for the Parallel.ForEachAsync is equal to the Environment.ProcessorCount, and for all other Parallel methods it is equal to -1, which means unlimited parallelism. My suggestion is to specify always the MaxDegreeOfParallelism when using any Parallel method, which is different from what Microsoft recommends.

    The sweet spot for the MaxDegreeOfParallelism for CPU-bound operations is Environment.ProcessorCount, although oversubscription might help if the workload is unbalanced. The Parallel.ForEachAsync is used typically for I/O-bound asynchronous operations, where the sweet spot depends on the capabilities of the remote server, or the bandwidth of the network.

    ¹ For more details about the behavior of the ThreadPool, you can look here.