Search code examples
.netmultithreadingperformancelinq

How to optimize performance of iteration of a stream by preloading stream items in the background?


I have a situation where I need to process large number of files one by one on the UI thread, where both the loading of each file and the its processing can take a significant amount of time and combined they use too much memory too load all at once. This is all running in a .NET 4.8 application and, unfortunately, in a part of the code base that is synchronous.

In essence, my code looks like this:

IEnumerable<ScanFile> stream = GetScanFileStream();

foreach (ScanFile scanFile in scanFiles)
{
    Process(scanFile); // Requires to be called on the UI thread
}

IEnumerable<ScanFile> GetScanFileStream() =>
    from filePath in Directory.EnumerateFiles("c:\\scans", "*.json")
    select this.LoadScanFile(filePath); // I like to run this in the background
}

Both the the time it takes to execute LoadScanFile is roughly equivalent to the time it takes to execute Process, I'm, therefore, hoping to cut the time it takes to process these files in half by pre-loading the next file on a background thread while the operation is still running on the UI thread.

What I tried was to create a special IEnumerable<T> decorator implementation, wrapping the original stream, which allowed this behavior, but I quickly found out that the implementation got quickly too complex. I was starting using semaphores to synchronize between the thread. That's when I stopped pursuing this solution thinking there should be easier solutions that have the same effect.

I expected this behavior to be achievable using LINQ to parallel making use of constructs build-in to the BCL and the CLR, but an extensive Google search didn't yield any good results.

What solution could you advise that allows me to cut processing time in half?


Solution

  • You can use an IAsyncEnumerable, and make LoadScanFile async (either using async functions or using Task.Run). Then call the next task before yielding the previoous one.

    IAsyncEnumerable<ScanFile> GetScanFileStream()
    {
        ScanFile scanFile = null;
        foreach (var filePath in Directory.EnumerateFiles("c:\\scans", "*.json"))
        {
            // start the next task
            var scanFileTask = this.LoadScanFileAync(filePath);
            // if we have one already yield it
            if (scanFile != null)
                yield return scanFile;
    
            scanFile = await scanFileTask;
        }
        if (scanFile != null)    // and yield the last one also
            yield return scanFile;
    }
    

    Now you can do

    await foreach (ScanFile scanFile in scanFiles)
    {
        Process(scanFile); // Requires to be called on the UI thread
    }