Search code examples
c#asynchronousasync-awaittaskcancellationtokensource

A call to CancellationTokenSource.Cancel never returns


I have a situation where a call to CancellationTokenSource.Cancel never returns. Instead, after Cancel is called (and before it returns) the execution continues with the cancellation code of the code that is being cancelled. If the code that is cancelled does not subsequently invoke any awaitable code then the caller that originally called Cancel never gets control back. This is very strange. I would expect Cancel to simply record the cancellation request and return immediately independent on the cancellation itself. The fact that the thread where Cancel is being called ends up executing code that belongs to the operation that is being cancelled and it does so before returning to the caller of Cancel looks like a bug in the framework.

Here is how this goes:

  1. There is a piece of code, let’s call it “the worker code” that is waiting on some async code. To make things simple let’s say this code is awaiting on a Task.Delay:

    try
    {
        await Task.Delay(5000, cancellationToken);
        // … 
    }
    catch (OperationCanceledException)
    {
        // ….
    }
    

Just before “the worker code” invokes Task.Delay it is executing on thread T1. The continuation (that is the line following the “await” or the block inside the catch) will be executed later on either T1 or maybe on some other thread depending on a series of factors.

  1. There is another piece of code, let’s call it “the client code” that decides to cancel the Task.Delay. This code calls cancellationToken.Cancel. The call to Cancel is made on thread T2.

I would expect thread T2 to continue by returning to the caller of Cancel. I also expect to see the content of catch (OperationCanceledException) executed very soon on thread T1 or on some thread other than T2.

What happens next is surprising. I see that on thread T2, after Cancel is called, the execution continues immediately with the block inside catch (OperationCanceledException). And that happens while the Cancel is still on the callstack. It is as if the call to Cancel is hijacked by the code that it is being cancelled. Here's a screenshot of Visual Studio showing this call stack:

Call stack

More context

Here is some more context about what the actual code does: There is a “worker code” that accumulates requests. Requests are being submitted by some “client code”. Every few seconds “the worker code” processes these requests. The requests that are processed are eliminated from the queue. Once in a while however, “the client code” decides that it reached a point where it wants requests to be processed immediately. To communicate this to “the worker code” it calls a method Jolt that “the worker code” provides. The method Jolt that is being called by “the client code” implements this feature by cancelling a Task.Delay that is executed by the worker’s code main loop. The worker’s code has its Task.Delay cancelled and proceeds to process the requests that were already queued.

The actual code was stripped down to its simplest form and the code is available on GitHub.

Environment

The issue can be reproduced in console apps, background agents for Universal Apps for Windows and background agents for Universal Apps for Windows Phone 8.1.

The issue cannot be reproduced in Universal apps for Windows where the code works as I would expect and the call to Cancel returns immediately.


Solution

  • CancellationTokenSource.Cancel doesn't simply set the IsCancellationRequested flag.

    The CancallationToken class has a Register method, which lets you register callbacks that will be called on cancellation. And these callbacks are called by CancellationTokenSource.Cancel.

    Let's take a look at the source code:

    public void Cancel()
    {
        Cancel(false);
    }
    
    public void Cancel(bool throwOnFirstException)
    {
        ThrowIfDisposed();
        NotifyCancellation(throwOnFirstException);            
    }
    

    Here's the NotifyCancellation method:

    private void NotifyCancellation(bool throwOnFirstException)
    {
        // fast-path test to check if Notify has been called previously
        if (IsCancellationRequested)
            return;
    
        // If we're the first to signal cancellation, do the main extra work.
        if (Interlocked.CompareExchange(ref m_state, NOTIFYING, NOT_CANCELED) == NOT_CANCELED)
        {
            // Dispose of the timer, if any
            Timer timer = m_timer;
            if(timer != null) timer.Dispose();
    
            //record the threadID being used for running the callbacks.
            ThreadIDExecutingCallbacks = Thread.CurrentThread.ManagedThreadId;
    
            //If the kernel event is null at this point, it will be set during lazy construction.
            if (m_kernelEvent != null)
                m_kernelEvent.Set(); // update the MRE value.
    
            // - late enlisters to the Canceled event will have their callbacks called immediately in the Register() methods.
            // - Callbacks are not called inside a lock.
            // - After transition, no more delegates will be added to the 
            // - list of handlers, and hence it can be consumed and cleared at leisure by ExecuteCallbackHandlers.
            ExecuteCallbackHandlers(throwOnFirstException);
            Contract.Assert(IsCancellationCompleted, "Expected cancellation to have finished");
        }
    }
    

    Ok, now the catch is that ExecuteCallbackHandlers can execute the callbacks either on the target context, or in the current context. I'll let you take a look at the ExecuteCallbackHandlers method source code as it's a bit too long to include here. But the interesting part is:

    if (m_executingCallback.TargetSyncContext != null)
    {
    
        m_executingCallback.TargetSyncContext.Send(CancellationCallbackCoreWork_OnSyncContext, args);
        // CancellationCallbackCoreWork_OnSyncContext may have altered ThreadIDExecutingCallbacks, so reset it. 
        ThreadIDExecutingCallbacks = Thread.CurrentThread.ManagedThreadId;
    }
    else
    {
        CancellationCallbackCoreWork(args);
    }
    

    I guess now you're starting to understand where I'm going to look next... Task.Delay of course. Let's look at its source code:

    // Register our cancellation token, if necessary.
    if (cancellationToken.CanBeCanceled)
    {
        promise.Registration = cancellationToken.InternalRegisterWithoutEC(state => ((DelayPromise)state).Complete(), promise);
    }
    

    Hmmm... what's that InternalRegisterWithoutEC method?

    internal CancellationTokenRegistration InternalRegisterWithoutEC(Action<object> callback, Object state)
    {
        return Register(
            callback,
            state,
            false, // useSyncContext=false
            false  // useExecutionContext=false
         );
    }
    

    Argh. useSyncContext=false - this explains the behavior you're seeing as the TargetSyncContext property used in ExecuteCallbackHandlers will be false. As the synchronization context is not used, the cancellation is executed on CancellationTokenSource.Cancel's call context.