Search code examples
c#.netasynchronous.net-core-2.1compiler-generated

C# async/await efficiency (or abusing) on reading DbDataReader


Stumbled upon a relatively often used piece of code which seemed inefficient at first. (I know optimization could be evil sometimes, but I was wondering)

introduction part - fairly simple SP execution + reading the returned data:

try
{
    await connection.OpenAsync();
    using (var command = connection.CreateCommand())
    {
        command.CommandText = sql.ToString();
        command.Parameters.AddRange(sqlParameters.ToArray());

        var reader = await command.ExecuteReaderAsync();
        if (reader.HasRows)
        {
            while (await reader.ReadAsync())
            {
                 var item = await GetProjectElement(reader);
                 list.Add(item);
            }
         }

         reader.Dispose();
     }      
}
finally
{
    connection.Close();
}

What worries me is the function

await GetProjectElement(reader)

private async Task<Project> GetProjectElement(DbDataReader reader)
{
    var item = new Project
    {
        Id = await reader.GetFieldValueAsync<int>(1),
        ParentId = await reader.IsDBNullAsync(2) ? default(int?) : await reader.GetFieldValueAsync<int>(2),
        Name = await reader.IsDBNullAsync(3) ? default(string) : await reader.GetFieldValueAsync<string>(3),
        Description = await reader.IsDBNullAsync(4) ? default(string) : await reader.GetFieldValueAsync<string>(4),
        Address = await reader.IsDBNullAsync(5) ? default(string) : await reader.GetFieldValueAsync<string>(5),
        City = await reader.IsDBNullAsync(6) ? default(string) : await reader.GetFieldValueAsync<string>(6),
        PostalCode = await reader.IsDBNullAsync(7) ? default(string) : await reader.GetFieldValueAsync<string>(7),
        Type = (ProjectTypeEnum)(await reader.GetFieldValueAsync<byte>(8)),
        StartDate = await reader.IsDBNullAsync(9) ? default(DateTime?) : await reader.GetFieldValueAsync<DateTime>(9),
        EstimatedEndDate = await reader.IsDBNullAsync(10) ? default(DateTime?) : await reader.GetFieldValueAsync<DateTime>(10),
        ActualEndDate = await reader.IsDBNullAsync(11) ? default(DateTime?) : await reader.GetFieldValueAsync<DateTime>(11),
        WebsiteUrl = await reader.IsDBNullAsync(12) ? default(string) : await reader.GetFieldValueAsync<string>(12),
        Email = await reader.IsDBNullAsync(13) ? default(string) : await reader.GetFieldValueAsync<string>(13),
        PhoneNumber = await reader.IsDBNullAsync(14) ? default(string) : await reader.GetFieldValueAsync<string>(14),
        MobilePhoneNumber = await reader.IsDBNullAsync(15) ? default(string) : await reader.GetFieldValueAsync<string>(15),
        Key = await reader.IsDBNullAsync(16) ? default(Guid?) : await reader.GetFieldValueAsync<Guid>(16),
        OrganizationElementId = await reader.GetFieldValueAsync<int>(17),
        CompanyOrganizationElementId = await reader.IsDBNullAsync(18) ? default(int?) : await reader.GetFieldValueAsync<int>(18),
        IsArchived = await reader.GetFieldValueAsync<bool>(20),
        IsDeleted = await reader.GetFieldValueAsync<bool>(21),
        CreatedOn = await reader.GetFieldValueAsync<DateTime>(22),
        CreatedBy = await reader.GetFieldValueAsync<string>(23),
        ModifiedOn = await reader.IsDBNullAsync(24) ? default(DateTime?) : await reader.GetFieldValueAsync<DateTime>(24),
        ModifiedBy = await reader.IsDBNullAsync(25) ? default(string) : await reader.GetFieldValueAsync<string>(25)
    };

    return item;
}

as you can see there are a LOT of await calls which the compiler turns into state machines, doesn't it?

You can find a simplified version of what the compiler generated code looks like here. Plenty of GOTOs which means context switching over and over again.

Since the SP was executed without specifying a CommandBehavior - the data will be in non-sequential mode. (probably the reason is a table row isn't supposed to be very large in bytes for this case of Project link)


My questions are:

1) is this abusing of the async/await without an obvious reason, because the row data is already buffered in memory, right?

2) is Task<Project> a pure overhead in this scenario?

3) would this approach actually have a worse performance compared to one without awaiting


Final thoughs: If I get things right, we would want to use CommandBehavior.SequentialAccess for large table rows where content may exceed a reasonable length thus making us want to read it asynchronously? (like storing varbinary(max) or blobs)


Solution

  • As others have noted, GOTO does not cause a context switch, and are quite fast.

    1) is this abusing of the async/await without an obvious reason, because the row data is already buffered in memory, right?

    ADO.NET allows implementors a lot of leeway in how exactly they implement the base types. Perhaps the row is in memory, perhaps not.

    2) is Task a pure overhead in this scenario?

    Yes, if the operation is actually synchronous. Which is an implementation detail of your ADO.NET provider.

    Note that the state machine and await add practically no overhead here; there's an async fast path where the code just keeps executing synchronously if possible.

    3) would this approach actually have a worse performance compared to one without awaiting

    Probably not. First off, the performance impact is not going to be driven by what little CPU work is done to call each method and continue executing synchronously. Any performance impact you'd see would be due to the additional Task<T> instances thrown on the Gen0 heap and having to be garbage collected. This is the reason there's now a ValueTask<T>.

    But even that performance impact would most likely be unnoticeable next to the network I/O call to your database server. That said, if you want to know the micro-performance penalties, the Zen of Async is a classic.