Search code examples
c#asynchronousasync-awaittask-parallel-libraryparallel.foreach

Why execute the return statement first instead of the Parallel.ForEach?


I want to use Parallel.ForEach to manipulate one set and return another. But I seem to get an empty one. And There is an asynchronous method that needs to be executed in Parallel.ForEach.

This is a console app with .NET Core 2.2 in Windows 10.

public static ConcurrentBag<int> GetList()
{
    ConcurrentBag<int> result = new ConcurrentBag<int>() ;
    List<int> list = new List<int> { 1, 2, 3 };
    Parallel.ForEach(list, async i => {
        await Task.Delay(i * 1000);
        result.Add(i * 2);
    });
    return result;
}

public static void Main(String[] args)
{
    List<int> list = new List<int>();
    var res = GetList();
    list.AddRange(res);
    Console.WriteLine("Begging.");
    foreach (var item in list)
    {
        Console.WriteLine(item);
    }
    Console.ReadLine();
}

I expect {2,4,6}, but actual an empty one.


Solution

  • await is the culprit. What you need to understand is that await is a fancy return. If the caller doesn't understand tasks, all it sees is the return. Parallel.ForEach doesn't expect a delegate that returns task, so it has no idea how to wait for the await to complete.

    The Parallel.ForEach finishes almost as soon as it started, and long before anything gets written to result.

    Now, Parallel is supposed to be used for CPU-bound operations, so this is not a problem. If you want to simulate a long-running CPU-bound operation, use Thread.Sleep instead of await Task.Delay.

    As a side note, how would you parallelize a task-based operation that is I/O bound? The simplest way would be something like this:

    await Task.WhenAll(list.Select(YourAsyncOperation));
    

    Where YourAsyncOperation is an async method returning Task, which can use Task.Delay as much as you want. The main problem with this simple approach is that you must be sure that YourAsyncOperation actually does an await, soon, and ideally doesn't use a synchronization context. In the worst case, all of the calls are going to be serialized. Well, really, in the absolute worst case, you get a deadlock, but... :)