Search code examples
asynchronousasync-awaitado.netsystem.io.file

ADO: Async all the way down the tubes?


Okay, so "async all the way down" is the mandate. But when is it problematic?

For example, if you have limited access to a resource, as in a DbConnection or a file, when do you stop using async methods in favor of synchronous?

Let's review the complexity of an asynchronous database call: (Not putting .ConfigureAwait(false) for readability.)

// Step 1: Ok, no big deal, our connection is closed, let's open it and wait.
await connection.OpenAsync();
// Connection is open!  Let's do some work.

// Step 2: Acquire a reader.
using(var reader = await command.ExecuteReaderAsync())
{
    // Step 3: Start reading results.
    while(await reader.ReadAsync())
    {
        // get the data.
    }
}

Steps:

  1. Should be reasonably innocuous and nothing to worry about.

  2. But now we've acquired an open connection in a potentially limited connection pool. What if when waiting for step 2, other long running tasks are at the head of the line in the task scheduler?

  3. Even worse now, we await with an open connection (and most likely added latency).

Aren't we holding open a connection longer than necessary? Isn't this an undesirable result? Wouldn't it be better to use synchronous methods to lessen the overall connection time, ultimately resulting in our data driven application performing better?

Of course I understand that async doesn't mean faster but async methods provide the opportunity for more total throughput. But as I've observed, there can definitely be weirdness when there are tasks scheduled in-between awaits that ultimately delay the operation, and essentially behave like blocking because of the limitations of the underlying resource.

[Note: this question is focused on ADO, but this also applies to file reads and writes.]

Hoping for some deeper insight. Thank you.


Solution

  • Significant amounts of iteration introduce significant added latency and extra CPU usage

    See http://telegra.ph/SqlDataReader-ReadAsync-vs-Read-04-18 for details.

    As suspected:

    Using async does not come without cost and requires consideration. Certain types of operations lend themselves well to async, and others are problematic (for what should be obvious reasons).

    High volume synchronous/blocking code has it's downsides, but for the most part is well managed by modern threading:

    Testing / Profiling

    4 x 100 paralleled queries, 1000 records each query.

    Performance Profile for Synchronous Query

    CPU Performance Profile for Synchronous Query Average Query: 00:00:00.6731697, Total Time: 00:00:25.1435656

    Performance Profile for Async Setup with Synchronous Read

    CPU Performance Profile for Async Setup with Synchronous Read Average Query: 00:00:01.4122918, Total Time: 00:00:30.2188467

    Performance Profile for Fully Async Query

    CPU Performance Profile for Fully Async Query Average Query: 00:00:02.6879162, Total Time: 00:00:32.6702872

    Assessment

    The above results were run on SQL Server 2008 R2 using a .NET Core 2 console application. I invite anyone who has access to a modern instance of SQL Server to replicate these tests to see if there is a reversal in trend. If you find my testing method flawed, please comment so I correct and retest.

    As you can easily see in the results. The more asynchronous operations we introduce, the longer the the queries take, and the longer the total time to complete. Even worse, fully asynchronous uses more CPU overhead which is counter productive to the idea that using async tasks would provide more available thread time. This overhead could be due to how I'm running these tests, but it's important to treat each test in a similar way to compare. Again, if anyone has a way to prove that async is better, please do.

    I'm proposing here that "async all the way" has it's limitations and should be seriously scrutinized at certain iterative levels (like file, or data access).