Search code examples
c#async-awaitparallel-processingtasktask-parallel-library

Parallelism with async method: Why do I need to do Task.Run() here and can I avoid it?


I am aiming to improve my understanding regarding Concurrency in C# and ran into a question with a small toy problem I have. Let's consider an async method async Task<int> CountAsync(int id) that counts something based on a given ID, but for doing so, it has to make a web request for example, thus being async. Now I would like to parallelize counting multiple different IDs. I asumed that that not awaiting each CountAsync call in a loop should do the job.

var countTasks = new List<Task<int>>();
for (int i = 1; i <= upUntil; i++)
{
    countTasks.Add(CountAsync(i));
}
await Task.WhenAll(countTasks);
var count = countTasks.Select(x => x.Result).Sum();

However, it shows that this is as fast, as simply calling and awaiting each Count inside the loop. To get the counting to be actually done in parallel, I have to make use of Task.Run() and write my code like this:

var countTasks = new List<Task<int>>();
for (int i = 1; i <= upUntil; i++)
{
    countTasks.Add(Task.Run(async () => await CountAsync(i)));
}
await Task.WhenAll(countTasks);
var count = countTasks.Select(x => x.Result).Sum();

Or use the alternative with AsParallel() to get some more control over the degree of parallelism.

count = Enumerable.Range(1, upUntil)
    .AsParallel()
    .WithDegreeOfParallelism(16)
    .Select(CountAsync)
    .Sum(t => t.Result);

Reading "Concurrency in C#" by Stephen Cleary made me aware of parallelism and asynchronous methods being different and the fact that I am actually in a parallel processing scenario here. However, I would have assumed that not awaiting CountAsync in my first loop and simply collecting the Promises of the async method, would also lead to a speedup in executing, as it can be offloaded to a different thread.

I obviously lack understanding here and would like to figure out what happens behind the scenes in my first example. Also: are my other code examples the way to go when having a parallel operation with async methods? I do not like the use of .Result.

Edit: As requested I add the CountAsync that I use for my playground example. It counts the number of "A"s in a random "AB" string. Task.Delay is used to make it slow and async, much like the fiddle @Auditive posted. I am curious about the fiddle, because it seems to match what I originally thought would happen. One detail that might be important: I have implemented the calling code inside a GET Controller Method of a .NET 8 ASP.NET Core application.

private static async IAsyncEnumerable<string> GetObjectsSlowlyAsync(int id)
{
    var random = new Random(id);
    
    for (int i = 0; i < 1000000; i++)
    {
        if (random.Next(0, 10) == 1)
            yield return "A";
        else
            yield return "B";

        await Task.Delay(TimeSpan.FromMicroseconds(random.Next(400, 800)));
    }
}

public static async Task<int> CountAsync(int id)
{
    var count = 0;
    await foreach (var x in GetObjectsSlowlyAsync(id))
    {
        if (x == "A") count++;
    }
    return count;
}

The difference I experience with both ways of calling CountAsync is almost ten-fold: enter image description here enter image description here


Solution

  • await Task.Delay(TimeSpan.FromMicroseconds(random.Next(400, 800)));
    

    It seems that you have high expectations about the ability of .NET timers to be triggered in sub-millisecond timespans. In reality any timespan smaller than one millisecond is evaporated to zero:

    Task task = Task.Delay(TimeSpan.FromMicroseconds(999));
    Console.WriteLine(ReferenceEquals(task, Task.CompletedTask)); // True
    

    Online demo.

    The await in your code awaits a completed Task, so it's equivalent to this:

    await Task.CompletedTask;
    

    The result is that your GetObjectsSlowlyAsync is completely synchronous. It is neither slow nor async.

    Which makes obvious why you need the Task.Run to parallelize this code. Synchronous code is not parallelized by itself.


    As a side note, this PLINQ query is not doing what you expect it to do.

    count = Enumerable.Range(1, upUntil)
        .AsParallel()
        .WithDegreeOfParallelism(16)
        .Select(CountAsync)
        .Sum(t => t.Result);
    

    The PLINQ library is not async-friendly. In case the CountAsync was actually asynchronous, the Select operator would project all the inputs into incomplete tasks very fast. The WithDegreeOfParallelism policy would affect only the creation of the tasks, not the maximum number of tasks that are concurrently in-flight. Then the .Sum(t => t.Result) would block a bunch of ThreadPool threads, plus the current thread, waiting for the completion of these tasks, for no good reason. The Task.WaitAll method could do the same, blocking only the current thread.