Search code examples
c#performanceasynchronousasync-awaitconcurrency

Await thousands of Tasks


I have an application which converts some data often there are 1.000 - 30.000 files.

I need to do 3 steps:

  1. copy a File (replace some text in there)
  2. Make a Webrequest with WebClient to download a file (I send the copied file to a WebServer, which converts the file to another format)
  3. Take the downloaded file and change some of the content

So all three steps include some I/O and I used async/await methods:

var tasks = files.Select(async (file) =>
{
    Item item = await createtempFile(file).ConfigureAwait(false);
    await convert(item).ConfigureAwait(false);
    await clean(item).ConfigureAwait(false);
}).ToList();

await Task.WhenAll(tasks).ConfigureAwait(false);

I don´t know if this is the best practice, because I create more than thousand tasks. I thought about splitting the three steps like:

List<Item> items = new List<Item>();
var tasks = files.Select(async (file) =>
{
    Item item = await createtempFile(file, ext).ConfigureAwait(false);
    lock(items)
        items.Add(item);
}).ToList();

await Task.WhenAll(tasks).ConfigureAwait(false);

var tasks = items.Select(async (item) =>
{
    await convert(item, baseAddress, ext).ConfigureAwait(false);
}).ToList();

await Task.WhenAll(tasks).ConfigureAwait(false);

var tasks = items.Select(async (item) =>
{
    await clean(targetFile, item.Doctype, ext).ConfigureAwait(false);
}).ToList();

await Task.WhenAll(tasks).ConfigureAwait(false);

But that doesn´t seem to be better or faster, because I create 3 times thousands of tasks.

Should I throttle the creation of tasks? Like chunks of 100 tasks? Or am I just overthinking it and the creation of thousands of tasks is just fine.

The CPU is idling with 2-4% peak, so I thought about too many awaits or context switches.

Maybe the WebRequest calls are too many, because the WebServer/WebService can´t handle thousands of Requests simultaneously and I should only throttle the WebRequests?

I already increased the .NET maxconnection in the app.config file.


Solution

  • As commenters have correctly noted, you're overthinking it. The .NET runtime has absolutely no problem tracking thousands of tasks.

    However, you might want to consider using a TPL Dataflow pipeline, which would enable you to easily have different concurrency levels for different operations ("blocks") in your pipeline.