Search code examples
c#.netasync-await.net-4.5

Is LogicalOperationStack incompatible with async in .Net 4.5


Trace.CorrelationManager.LogicalOperationStack enables having nested logical operation identifiers where the most common case is logging (NDC). Should it still work with async-await?

Here's a simple example using LogicalFlow which is my simple wrapper over the LogicalOperationStack:

private static void Main() => OuterOperationAsync().GetAwaiter().GetResult();

private static async Task OuterOperationAsync()
{
    Console.WriteLine(LogicalFlow.CurrentOperationId);
    using (LogicalFlow.StartScope())
    {
        Console.WriteLine("\t" + LogicalFlow.CurrentOperationId);
        await InnerOperationAsync();
        Console.WriteLine("\t" + LogicalFlow.CurrentOperationId);
        await InnerOperationAsync();
        Console.WriteLine("\t" + LogicalFlow.CurrentOperationId);
    }
    Console.WriteLine(LogicalFlow.CurrentOperationId);
}

private static async Task InnerOperationAsync()
{
    using (LogicalFlow.StartScope())
    {
        await Task.Delay(100);
    }
}

LogicalFlow:

public static class LogicalFlow
{
    public static Guid CurrentOperationId =>
        Trace.CorrelationManager.LogicalOperationStack.Count > 0
            ? (Guid) Trace.CorrelationManager.LogicalOperationStack.Peek()
            : Guid.Empty;

    public static IDisposable StartScope()
    {
        Trace.CorrelationManager.StartLogicalOperation();
        return new Stopper();
    }

    private static void StopScope() => 
        Trace.CorrelationManager.StopLogicalOperation();

    private class Stopper : IDisposable
    {
        private bool _isDisposed;
        public void Dispose()
        {
            if (!_isDisposed)
            {
                StopScope();
                _isDisposed = true;
            }
        }
    }
}

Output:

00000000-0000-0000-0000-000000000000
    49985135-1e39-404c-834a-9f12026d9b65
    54674452-e1c5-4b1b-91ed-6bd6ea725b98
    c6ec00fd-bff8-4bde-bf70-e073b6714ae5
54674452-e1c5-4b1b-91ed-6bd6ea725b98

The specific values don't really matter, but as I understand it both the outer lines should show Guid.Empty (i.e. 00000000-0000-0000-0000-000000000000) and the inner lines should show the same Guid value.

You might say that LogicalOperationStack is using a Stack which is not thread-safe and that's why the output is wrong. But while that's true in general, in this case there's never more than a single thread accessing the LogicalOperationStack at the same time (every async operation is awaited when called and no use of combinators such as Task.WhenAll)

The issue is that LogicalOperationStack is stored in the CallContext which has a copy-on-write behavior. That means that as long as you don't explicitly set something in the CallContext (and you don't when you add to an existing stack with StartLogicalOperation) you're using the parent context and not your own.

This can be shown by simply setting anything into the CallContext before adding to the existing stack. For example if we changed StartScope to this:

public static IDisposable StartScope()
{
    CallContext.LogicalSetData("Bar", "Arnon");
    Trace.CorrelationManager.StartLogicalOperation();
    return new Stopper();
}

The output is:

00000000-0000-0000-0000-000000000000
    fdc22318-53ef-4ae5-83ff-6c3e3864e37a
    fdc22318-53ef-4ae5-83ff-6c3e3864e37a
    fdc22318-53ef-4ae5-83ff-6c3e3864e37a
00000000-0000-0000-0000-000000000000

Note: I'm not suggesting anyone actually do this. The real practical solution would be to use an ImmutableStack instead of the LogicalOperationStack as it's both thread-safe and since it's immutable when you call Pop you get back a new ImmutableStack that you then need to set back into the CallContext. A full implementation is available as an answer to this question: Tracking c#/.NET tasks flow

So, should LogicalOperationStack work with async and it's just a bug? Is LogicalOperationStack just not meant for the async world? Or am I missing something?


Update: Using Task.Delay is apparently confusing as it uses System.Threading.Timer which captures the ExecutionContext internally. Using await Task.Yield(); instead of await Task.Delay(100); makes the example easier to understand.


Solution

  • Yes, LogicalOperationStack should work with async-await and it is a bug that it doesn't.

    I've contacted the relevant developer at Microsoft and his response was this:

    "I wasn't aware of this, but it does seem broken. The copy-on-write logic is supposed to behave exactly as if we'd really created a copy of the ExecutionContext on entry into the method. However, copying the ExecutionContext would have created a deep copy of the CorrelationManager context, as it's special-cased in CallContext.Clone(). We don't take that into account in the copy-on-write logic."

    Moreover, he recommended using the new System.Threading.AsyncLocal<T> class added in .Net 4.6 instead which should handle that issue correctly.

    So, I went ahead and implemented LogicalFlow on top of an AsyncLocal instead of the LogicalOperationStack using VS2015 RC and .Net 4.6:

    public static class LogicalFlow
    {
        private static AsyncLocal<Stack> _asyncLogicalOperationStack = new AsyncLocal<Stack>();
    
        private static Stack AsyncLogicalOperationStack
        {
            get
            {
                if (_asyncLogicalOperationStack.Value == null)
                {
                    _asyncLogicalOperationStack.Value = new Stack();
                }
    
                return _asyncLogicalOperationStack.Value;
            }
        }
    
        public static Guid CurrentOperationId =>
            AsyncLogicalOperationStack.Count > 0
                ? (Guid)AsyncLogicalOperationStack.Peek()
                : Guid.Empty;
    
        public static IDisposable StartScope()
        {
            AsyncLogicalOperationStack.Push(Guid.NewGuid());
            return new Stopper();
        }
    
        private static void StopScope() =>
            AsyncLogicalOperationStack.Pop();
    }
    

    And the output for the same test is indeed as it should be:

    00000000-0000-0000-0000-000000000000
        ae90c3e3-c801-4bc8-bc34-9bccfc2b692a
        ae90c3e3-c801-4bc8-bc34-9bccfc2b692a
        ae90c3e3-c801-4bc8-bc34-9bccfc2b692a
    00000000-0000-0000-0000-000000000000