Search code examples
pythonpython-asynciocancellationresource-leakresource-cleanup

Do asynchronous context managers need to protect their cleanup code from cancellation?


The problem (I think)

The contextlib.asynccontextmanager documentation gives this example:

@asynccontextmanager
async def get_connection():
    conn = await acquire_db_connection()
    try:
        yield conn
    finally:
        await release_db_connection(conn)

It looks to me like this can leak resources. If this code's task is cancelled while this code is on its await release_db_connection(conn) line, the release could be interrupted. The asyncio.CancelledError will propagate up from somewhere within the finally block, preventing subsequent cleanup code from running.

So, in practical terms, if you're implementing a web server that handles requests with a timeout, a timeout firing at the exact wrong time could cause a database connection to leak.

Full runnable example

import asyncio
from contextlib import asynccontextmanager

async def acquire_db_connection():
    await asyncio.sleep(1)
    print("Acquired database connection.")
    return "<fake connection object>"

async def release_db_connection(conn):
    await asyncio.sleep(1)
    print("Released database connection.")

@asynccontextmanager
async def get_connection():
    conn = await acquire_db_connection()
    try:
        yield conn
    finally:
        await release_db_connection(conn)

async def do_stuff_with_connection():
    async with get_connection() as conn:
        await asyncio.sleep(1)
        print("Did stuff with connection.")

async def main():
    task = asyncio.create_task(do_stuff_with_connection())

    # Cancel the task just as the context manager running
    # inside of it is executing its cleanup code.
    await asyncio.sleep(2.5)
    task.cancel()
    try:
        await task
    except asyncio.CancelledError:
        pass

    print("Done.")

asyncio.run(main())

Output on Python 3.7.9:

Acquired database connection.
Did stuff with connection.
Done.

Note that Released database connection is never printed.

My questions

  • This is a problem, right? Intuitively to me, I expect .cancel() to mean "cancel gracefully, cleaning up any resources used along the way." (Otherwise, why would they have implemented cancellation as exception propagation?) But I could be wrong. Maybe, for example, .cancel() is meant to be fast instead of graceful. Is there an authoritative source that clarifies what .cancel() is supposed to do here?
  • If this is indeed a problem, how do I fix it?

Solution

  • Focusing on protecting the cleanup from cancellation is a red herring. There is a multitude of things that can go wrong and the context manager has no way to know

    • which errors can occur, and
    • which errors must be protected against.

    It is the responsibility of the resource handling utilities to properly handle errors.

    • If release_db_connection must not be cancelled, it must protect itself against cancellation.
    • If acquire/release must be run as a pair, it must be a single async with context manager. Further protection, e.g. against cancellation, may be involved internally as well.
    async def release_db_connection(conn):
        """
        Cancellation safe variant of `release_db_connection`
    
        Internally protects against cancellation by delaying it until cleanup.
        """
        # cleanup is run in separate task so that it
        # cannot be cancelled from the outside.
        shielded_release = asyncio.create_task(asyncio.sleep(1))
        # Wait for cleanup completion – unlike `asyncio.shield`,
        # delay any cancellation until we are done.
        try:
            await shielded_release
        except asyncio.CancelledError:
            await shielded_release
            # propagate cancellation when we are done
            raise
        finally:
            print("Released database connection.")
    

    Note: Asynchronous cleanup is tricky. For example, a simple asyncio.shield is not sufficient if the event loop does not wait for shielded tasks. Avoid inventing your own protection and rely on the underlying frameworks to do the right thing.


    The cancellation of a task is a graceful shutdown that a) still allows async operations and b) may be delayed/suppressed. Coroutines being prepared to handle the CancelledError for cleanup is explicitly allowed.

    Task.cancel

    The coroutine then has a chance to clean up or even deny the request by suppressing the exception with a try … … except CancelledError … finally block. […] Task.cancel() does not guarantee that the Task will be cancelled, although suppressing cancellation completely is not common and is actively discouraged.

    A forceful shutdown is coroutine.close/GeneratorExit. This corresponds to an immediate, synchronous shutdown and forbids suspension via await, async for or async with.

    coroutine.close

    […] it raises GeneratorExit at the suspension point, causing the coroutine to immediately clean itself up.