c#asynchronous async-await task-parallel-library task

Proper way to implement methods that return Task<T>

For simplicity let's imagine we have a method that should return an object while doing some heavy operation. There're two ways to implement:

public Task<object> Foo()
{
    return Task.Run(() =>
    {
        // some heavy synchronous stuff.

        return new object();
    }
}

And

public async Task<object> Foo()
{
    return await Task.Run(() =>
    {
        // some heavy stuff
        return new object();
    }
}

After examining generated IL there're two completely different things generated:

.method public hidebysig 
    instance class [mscorlib]System.Threading.Tasks.Task`1<object> Foo () cil managed 
{
    // Method begins at RVA 0x2050
    // Code size 42 (0x2a)
    .maxstack 2
    .locals init (
        [0] class [mscorlib]System.Threading.Tasks.Task`1<object>
    )

    IL_0000: nop
    IL_0001: ldsfld class [mscorlib]System.Func`1<object> AsyncTest.Class1/'<>c'::'<>9__0_0'
    IL_0006: dup
    IL_0007: brtrue.s IL_0020

    IL_0009: pop
    IL_000a: ldsfld class AsyncTest.Class1/'<>c' AsyncTest.Class1/'<>c'::'<>9'
    IL_000f: ldftn instance object AsyncTest.Class1/'<>c'::'<Foo>b__0_0'()
    IL_0015: newobj instance void class [mscorlib]System.Func`1<object>::.ctor(object, native int)
    IL_001a: dup
    IL_001b: stsfld class [mscorlib]System.Func`1<object> AsyncTest.Class1/'<>c'::'<>9__0_0'

    IL_0020: call class [mscorlib]System.Threading.Tasks.Task`1<!!0> [mscorlib]System.Threading.Tasks.Task::Run<object>(class [mscorlib]System.Func`1<!!0>)
    IL_0025: stloc.0
    IL_0026: br.s IL_0028

    IL_0028: ldloc.0
    IL_0029: ret
}

And

.method public hidebysig 
    instance class [mscorlib]System.Threading.Tasks.Task`1<object> Foo () cil managed 
{
    .custom instance void [mscorlib]System.Runtime.CompilerServices.AsyncStateMachineAttribute::.ctor(class [mscorlib]System.Type) = (
        01 00 1a 41 73 79 6e 63 54 65 73 74 2e 43 6c 61
        73 73 31 2b 3c 42 61 72 3e 64 5f 5f 31 00 00
    )
    .custom instance void [mscorlib]System.Diagnostics.DebuggerStepThroughAttribute::.ctor() = (
        01 00 00 00
    )
    // Method begins at RVA 0x2088
    // Code size 59 (0x3b)
    .maxstack 2
    .locals init (
        [0] class AsyncTest.Class1/'<Foo>d__1',
        [1] valuetype [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1<object>
    )

    IL_0000: newobj instance void AsyncTest.Class1/'<Foo>d__1'::.ctor()
    IL_0005: stloc.0
    IL_0006: ldloc.0
    IL_0007: ldarg.0
    IL_0008: stfld class AsyncTest.Class1 AsyncTest.Class1/'<Foo>d__1'::'<>4__this'
    IL_000d: ldloc.0
    IL_000e: call valuetype [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1<!0> valuetype [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1<object>::Create()
    IL_0013: stfld valuetype [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1<object> AsyncTest.Class1/'<Foo>d__1'::'<>t__builder'
    IL_0018: ldloc.0
    IL_0019: ldc.i4.m1
    IL_001a: stfld int32 AsyncTest.Class1/'<Foo>d__1'::'<>1__state'
    IL_001f: ldloc.0
    IL_0020: ldfld valuetype [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1<object> AsyncTest.Class1/'<Foo>d__1'::'<>t__builder'
    IL_0025: stloc.1
    IL_0026: ldloca.s 1
    IL_0028: ldloca.s 0
    IL_002a: call instance void valuetype [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1<object>::Start<class AsyncTest.Class1/'<Foo>d__1'>(!!0&)
    IL_002f: ldloc.0
    IL_0030: ldflda valuetype [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1<object> AsyncTest.Class1/'<Foo>d__1'::'<>t__builder'
    IL_0035: call instance class [mscorlib]System.Threading.Tasks.Task`1<!0> valuetype [mscorlib]System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1<object>::get_Task()
    IL_003a: ret
}

As you can see in the first case logic is straightforward, lambda function is created and then call to the Task.Run is generated and the result is returned. In the second example instance of AsyncTaskMethodBuilder is created and then task is actually built and returned. Since I always expected foo method to be called as await Foo() at some higher level, I have always used the first example. However, I see the latter more often. So which approach is correct? What pros and cons do each have?

Real world example

Let's say we have UserStore which has method Task<User> GetUserByNameAsync(string userName) which is used inside web api controller like:

public async Task<IHttpActionResult> FindUser(string userName)
{
    var user = await _userStore.GetUserByNameAsync(userName);

    if (user == null)
    {
        return NotFound();
    }

    return Ok(user);
}

Which implementation of Task<User> GetUserByNameAsync(string userName) would be correct?

public Task<User> GetUserByNameAsync(string userName)
{
    return _dbContext.Users.FirstOrDefaultAsync(user => user.UserName == userName);
}

public async Task<User> GetUserNameAsync(string userName)
{
    return await _dbContext.Users.FirstOrDefaultAsync(user => user.UserName == username);
}

Solution

As you can see from the IL, async/await creates a state machine (and an additional Task) even in the case of trivial async tail calls, i.e.

return await Task.Run(...);

This leads to performance degradation due to additional instructions and allocations. So the rule of thumb is: if your method ends with await ... or return await ..., and it is the one and only await statement, then it's generally safe to remove the async keyword and directly return the Task that you were going to await.

One potentially unintended consequence of doing this is that if an exception is thrown inside the returned Task, the outer method will not appear in the stack trace.

There's a hidden gotcha in the return await ... case too though. If the awaiter is not explicitly configured not to continue on captured context via ConfigureAwait(false), then the outer Task (the one created for you by the async state machine) cannot transition to completed state until the final postback to the SynchronizationContext (captured just before the await) has finished. This serves no real purpose but can still result in a deadlock if you block on the outer task for some reason (here's a detailed explanation of what happens in such case).