Search code examples
c#.netmultithreadingtask-parallel-librarywebrequest

Efficient strategy to do Multiple Webrequests and Response parsing in parralel


My service has a function which fires a webrequest, parses the responses and doing lot of computes with the response to finally give a collection.

I have to now do multiple invocations to those to get many collections which could be aggregated into one later. I guess I could choose any of this Parrallel.ForEach and Tasks.StartNew

Can you advise which could be efficient in handling this scenario having webrequest processing & compute.


Solution

  • Parrallel.ForEach and Tasks.StartNew are for CPU-bound workloads. For I/O-bound workloads you need asynchronity, which is most conveniently provided by Task.Run(async () => { await ... Now in your case you have both CPU and I/O bound workloads. The good news is that the async infrastructure handles well CPU-bound workloads too. For example this is perfectly valid:

    private async void Button1_Click(object sender, EventArgs args)
    {
        var webData = await GetWebData(url); // I/O bound
        var parsedList = await Task.Run(() => ParseWebData(webData)); // CPU bound
        await SaveListToDB(parsedList); // I/O bound
    }
    

    No threads will be blocked during the I/O operations, and a thread-pool thread will do the CPU intensive parsing. From scalability and resources-preservation perspective you can't do much better than that. But if you are willing to tie down all the resources of your machine to get the best performance possible, leaving nothing free for other processes, then your strategy should be to have all your processors/cores busy all the time, while at the same time performing the maximum number of concurrent I/O operations the external world can handle (web servers, filesystems, databases, all have limits at how much work can do at the same time).

    A good tool you should look for implementing this strategy is the TPL Dataflow library, build-in for .NET Core and available as a package for .NET Framework. If you know nothing about it, it has some learning curve, but not very steep. After 2-3 days of studying you'll feel quite confident that you can write quality and robust production code with it. It has all the tools you need for splitting, joining, transforming, buffering and parallelizing your workload, in a way that makes you feel that you are in control of the process, without micromanaging everything.