Search code examples
c#parallel-processingparallel.foreach

Process not running in parallel using ParallelForEachAsync


I am testing running python via Process.Start in parallel

My machine has a 2.8GHz CPU with 4 cores and 8 logical processors

My main console application is as below

    static void Main(string[] args) => MainAsync(args).GetAwaiter().GetResult();

    static async Task MainAsync(string[] args)
    {            
        var startTime = DateTime.UtcNow;
        Console.WriteLine($"Execution started at {DateTime.UtcNow:T}");

        await ExecuteInParallelAsync(args).ConfigureAwait(false);
        Console.WriteLine($"Executions completed at {DateTime.UtcNow:T}");
        var endTime = DateTime.UtcNow;

        var duration = (endTime - startTime);
        Console.WriteLine($"Execution took {duration.TotalMilliseconds} milliseconds {duration.TotalSeconds} seconds");

        Console.WriteLine("Press Any Key to close");
        Console.ReadKey();            
    }

Where ExecuteInParallelAsync is the method that does the work...

    private static async Task ExecuteInParallelAsync(string[] args)
    {
        var executionNumbers = new List<int>();
        var executions = 5;

        for (var executionNumber = 1; executionNumber <= executions; executionNumber++)
        {
            executionNumbers.Add(executionNumber);
        }

        await executionNumbers.ParallelForEachAsync(async executionNumber =>
        {
             Console.WriteLine($"Execution {executionNumber} of {executions} {DateTime.UtcNow:T}");
            ExecuteSampleModel();
            Console.WriteLine($"Execution {executionNumber} complete {DateTime.UtcNow:T}");
        }).ConfigureAwait(false);
    }

ExecuteSampleModel runs the Python model...

    IModelResponse GetResponse()
    { 
        _actualResponse = new ModelResponse();

        var fileName = $@"main.py";

        var p = new Process();
        p.StartInfo = new ProcessStartInfo(@"C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\python.exe", fileName)
        {
            WorkingDirectory = RootFolder,
            RedirectStandardOutput = true,
            UseShellExecute = false,
            CreateNoWindow = true
        };
        p.Start();

        _actualResponse.RawResponseFromModel = p.StandardOutput.ReadToEnd();
        p.WaitForExit();

        return _actualResponse;
    }

As you can see, I am asking this model to be executed 5 times

When I use the debugger it appears as though even though I am using ParalellForEach (introduced by the AsyncEnumerator package) this is not being run in parallel

I thought that each iteration is run on its own thread?

Each Python Model execution takes 5 seconds.

Running in parallel I would expect the whole process to be done in 15 seconds or so but it actually takes 34 seconds

The Console.WriteLines added before and after the call to GetResponse show that the first call is starting, being executed in full, then the second is starting, etc

Is this something to do with me calling Process.Start?

Can anyone see anything wrong with this?

Paul


Solution

  • To make the answer useful here is explanation what happened with async code. Omitting lot of details which aren't so important from standpoint of explanation the code inside ParallelForEachAsync loop looks like as follows:

    // some preparations
    ...
    var itemIndex = 0L;
    while (await enumerator.MoveNextAsync(cancellationToken).ConfigureAwait(false))
    {
        ...
        Task itemActionTask = null;
        try
        {
            itemActionTask = asyncItemAction(enumerator.Current, itemIndex);
        }
        catch (Exception ex)
        {
           // some exception handling
        }
        ...
        itemIndex++;
    }
    

    where asyncItemAction has type Func<T, long, Task> and it's a wrapper around custom asynchronous action with type Func<T, Task> which is passed as parameter to the ParallelForEachAsync call (the wrapper adds indexing functionality). The loop code just calls this action in order to obtain a task which would represent the asynchronous operation promise to wait for its completion. In case of given code example the custom action

    async executionNumber =>
    {
         Console.WriteLine($"Execution {executionNumber} of {executions}{DateTime.UtcNow:T}");
         ExecuteSampleModel();
         Console.WriteLine($"Execution {executionNumber} complete {DateTime.UtcNow:T}");
    }
    

    contains no asynchronous code but prefix async allows compiler to generate state machine with method which returns some Task which makes this code compliant (from the syntax standpoint) with custom action call inside the loop. The important thing that code inside the loop expects this operation to be asynchronous which implies that the operation implicitly is split into synchronous part which will be executed along with asyncItemAction(enumerator.Current, itemIndex) call and at least one (one or more depending on number of awaits inside) asynchronous parts which can be executed during iterating over other loop items. The following pseudo-code gives an idea of that:

     {
         ... synchronous part
         await SomeAsyncOperation();
         ... asynchronous part
     } 
    

    In this particular case there is no async part in the custom action at all which consequently means that the call

     itemActionTask = asyncItemAction(enumerator.Current, itemIndex);
    

    will be executed synchronously and the next iteration inside the loop won't start until asyncItemAction completes the entire custom action execution.

    That's why the switching off of the asynchrony in the code and using simple parallelism helps.