Search code examples
c#plinq

Slow parallelizing of network-bound I/O when timeouts occur


I am parallelizing a method that highly relies on WinAPI NetAPI32 calls. The calls sometimes timeout if the user entered a host that is down or several in a list of hundreds.

int prevThreads, prevPorts;
ThreadPool.GetMinThreads(out prevThreads, out prevPorts);
ThreadPool.SetMinThreads(20, prevPorts);

var parallelScanList = computersToScan.AsParallel().WithExecutionMode(ParallelExecutionMode.ForceParallelism).WithDegreeOfParallelism(20);

Api.WinApi.AdvApi.LogonAndImpersonate(connection.UserCredential);

foreach (var computer in parallelScanList)
{
        //...
        //this takes a long time to timeout
        status = NetApi.NetUserEnum(computer.DnsHostname, 2,
                (int)NetApi.NetUserEnumFilter.FILTER_NORMAL_ACCOUNT,
                out userbufPtr, (int)LmCons.MAX_PREFERRED_LENGTH, out userEntriesRead, out totalEntries,
                out userResumeHandle);

}

We have similar logic to this in a C client using a consumer/producer. Spin up 20 threads and have them read a list until it's depleted.

function StartProcessingHosts()
{
  for 1 to 20
     StartProcessThread()
}

function ProcessHostsThread()
{
  while(moreHosts)
  {
     //obviously synchronization around here
     var host = popHost();
     DoSomething(host);
  }
}

And that's very fast because of all the waiting going on with these network calls and the possibility of being unable to connect to a downed host.

The way I'm currently doing it in C# seems to be doing it one at a time.


Solution

  • PLINQ, short for Parallel LINQ is for, you guessed it, parallelizing LINQ queries. For example, if you write collection.AsParallel().Where(/* some condition */).Select(/* some projection */).ToList(), then the Where() and Select() will execute in parallel.

    But you don't do that, you call AsParallel(), saying "the following LINQ query should execute in parallel". Then you configure the parallelism of the upcoming query by calling WithExecutionMode() and WithDegreeOfParallelism(). And then you don't actually have any LINQ query, instead you use foreach, which will iterate the collection serially.

    If you want to execute a foreach in parallel, you don't want PLINQ, you want Parallel.ForEach():

    Parallel.ForEach(computersToScan, new ParallelOptions { MaxDegreeOfParallelism = 20 },
        computer =>
        {
            //...
        });