Search code examples
pythonpython-3.xexceptionpython-asynciokeyboardinterrupt

KeyboardInterrupt in asyncio.TaskGroup


The docs on Task Groups say:

Two base exceptions are treated specially: If any task fails with KeyboardInterrupt or SystemExit, the task group still cancels the remaining tasks and waits for them, but then the initial KeyboardInterrupt or SystemExit is re-raised instead of ExceptionGroup or BaseExceptionGroup.

This makes me believe, given the following code:

import asyncio

async def task():
    await asyncio.sleep(10)

async def run() -> None:
    try:
        async with asyncio.TaskGroup() as tg:
            t1 = tg.create_task(task())
            t2 = tg.create_task(task())
        print("Done")
    except KeyboardInterrupt:
        print("Stopped")

asyncio.run(run())

running and hitting Ctrl-C should result in printing Stopped; but in fact, the exception is not caught:

^CTraceback (most recent call last):
  File "<python>/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<python>/asyncio/base_events.py", line 685, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "<module>/__init__.py", line 8, in run
    async with asyncio.TaskGroup() as tg:
  File "<python>/asyncio/taskgroups.py", line 134, in __aexit__
    raise propagate_cancellation_error
  File "<python>/asyncio/taskgroups.py", line 110, in __aexit__
    await self._on_completed_fut
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 189, in _run_module_as_main
  File "<frozen runpy>", line 148, in _get_module_details
  File "<frozen runpy>", line 112, in _get_module_details
  File "<module>/__init__.py", line 15, in <module>
    asyncio.run(run())
  File "<python>/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "<python>/asyncio/runners.py", line 123, in run
    raise KeyboardInterrupt()
KeyboardInterrupt

What am I missing? What is the correct way of detecting KeyboardInterrupt?


Solution

  • TL;DR

    This acutally isn't TaskGroup's fault. Try running this:

    >>> async def other_task():
    ...     try:
    ...         await asyncio.sleep(10)
    ...     except KeyboardInterrupt:
    ...         print("Stopped")
    >>>
    >>> asyncio.run(other_task())
    KeyboardInterrupt
    
    >>>
    

    This also doesn't print. Nor this:

    >>> async def other_task():
    ...     try:
    ...         await asyncio.sleep(10)
    ...     except Exception as err:
    ...         print("Stopped by", err)
    >>>
    >>> asyncio.run(other_task())
    KeyboardInterrupt
    
    >>>
    

    You can't catch KeyboardInterrupt here.

    I can't say I know exactly why, but can say this is why asyncio.shield is useless protecting tasks from canceling, that it requires one to write our own signal handlers for asyncio because task cancellation internally triggers KeyboardInterrupt.



    Alternative way


    trio - made by Nathaniel J. Smith who first came up with modern Structured concurrency - does clearly what you intended.

    # NOTE: somehow ctrl+c doesn't work in ptpython. Following ran on default python shell
    
    # Python 3.12.1 (tags/v3.12.1:2305ca5, Dec  7 2023, 22:03:25) [MSC v.1937 64 bit (AMD64)] on win32
    >>> import trio
    >>> async def task():
    ...     await trio.sleep(5)
    ...
    >>> async def run():
    ...     try:
    ...         async with trio.open_nursery() as nursery:
    ...             for _ in range(10):
    ...                 nursery.start_soon(task)
    ...     except* KeyboardInterrupt:
    ...         print("Nursery caught KeyboardInterrupt!")
    ...
    >>> trio.run(run)
    Nursery caught KeyboardInterrupt!
    >>>
    
    # Python 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] on linux
    >>> import trio
    >>> from exceptiongroup import catch
    >>> async def task():
    ...     await trio.sleep(5)
    ...
    >>> async def run():
    ...     def handle_keyboard_interrupt(excgroup):
    ...         nonlocal nursery
    ...         print("Nursery caught KeyboardInterrupt!")
    ...         nursery.cancel_scope.cancel()
    ...
    ...     with catch({KeyboardInterrupt: handle_keyboard_interrupt}):
    ...         async with trio.open_nursery() as nursery:
    ...             for _ in range(10):
    ...                 nursery.start_soon(task)
    ...
    >>> trio.run(run)
    ^CNursery caught KeyboardInterrupt!
    >>>
    

    which is not callback soup unlike asyncio, making it much more stable and intuitive, predictive. (Not saying asyncio is trash; it was the right option when it was created.)

    So definitely check out trio when what you're trying to make requires such. asyncio.TaskGroup is just asyncio's attempt of Structured Concurrency achieved by trio.Nursery and original does it's job better than asyncio which inevitably suffer from internals being callback soup.

    Edit: Also you can use trio-asyncio or such wrappers to run asyncio loop on the trio, or mix thread-based libraries with trio



    More details


    To why this happens, this is the hint:

    >>> async def sync_task():
    ...     try:
    ...         time.sleep(10)
    ...     except KeyboardInterrupt:
    ...         print("Stopping")
    
    >>> asyncio.run(sync_task())
    Stopping
    KeyboardInterrupt
    

    If we intentionally block the thread by time.sleep() call, it do catches the KeyboardInterrupt. (Despite asyncio also re-raise again)

    Which is just same as this.

    >>> def sync_task():
    ...     try:
    ...         time.sleep(10)
    ...     except KeyboardInterrupt:
    ...         print("Stopping")
    >>> sync_task()
    Stopping
    

    Now we established that we do can catch KeyboardInterrupt in synchronous code section regardless of the surroundings being async context or not.

    Then we can draw the attention back to:

    Is await asyncio.sleep(10) synchronous?

    And this might sound obvious, it's not!

    ...But then, What is being executed if it's not synchronous?


    That's the core reason why asyncio can't catch a thing on KeyboardInterrupt. Because it's not our task, but the main thread that's consistently checking if certain event should be triggered or not.

    In other words, we just killed the main loop by KeyboardInterrupt while our tasks are suspended, waiting for main thread's loop to call it's callback for event "Call me AFTER 10 seconds are passed". Hence there was no chance for our poor task to ever catch that exception.

    This is because unlike other exceptions that is Returned as Result of task (then probably re-thrown at await keyword), KeyboardInterrupt is an urgent exception signaling to stop whatever we're doing, so event loop decide to stop and start doing cleanup if possible.



    Way more detail about Callback-based vs async/await native


    To quote from Control-C handling in Python and Trio by Nathaniel J. Smith :

    By default, the Python interpreter sets things up so that control-C will cause a KeyboardInterrupt exception to materialize at some point in your code.

    This is pretty nice! If your code was accidentally caught in an infinite loop, then it breaks out of that. If you have cleanup code in finally blocks, it gets run. It shows a traceback so you can find that infinite loop.

    That's the advantage of the KeyboardInterrupt approach: even if you didn't think about control-C at all while you were writing the program, then it still does something that's pretty darn reasonable – say, 99% of the time.

    lock.acquire()
    try:
        do_stuff()             # <-
        do_something_else()    # <- control-C anywhere here is safe
        and_some_more()        # <-
    finally:
        lock.release()
    

    But what if we're unlucky?

    lock.acquire()
                               # <- control-C could happen here
    try:
        ...
    finally:
                               # <- or here
        lock.release()
    

    If a KeyboardInterrupt happens at one of the two points marked above, then sucks to be us: the exception will propagate but our lock will never be released.


    ... KeyboardInterrupt is such powerful and dangerous message to Python - it's not mere simple Exception but is almost like a SIGINT for python. (On windows task kill triggers KeyboardInterrupt because it doesn't have SIGINT)


    This is the major disadvantage of Callback-based Concurrency we had for years on Javascript and many others, including python's future and asyncio.

    From python's history of 1991~2015:

    • Python 1.0 release (Python 1, 1991)

    • Asyncore library (Python 1, 1996)

      • Callback based
    • Generator support added (Python 2.2, 2001)

    • Twisted (Python 2.X, 2002)

      • Easier concurrency programming via Generator
    • Generator.send() support added (Python 2.5, 2005)

      • Now generator can 'talk' with caller
    • Future library (Python 3.2, 2011)

    • asyncio library (Python 3.4, 2014)

      • generator based, wraps Future
      • Best of the bests from previous libraries
    • async / await added (Python 3.5, 2015)


    asyncio was never designed with async / await in mind - but is based on callbacks like recent Javascript does. (Initial javascript release at 1995 didn't have concurrency iirc)

    So how we use asyncio with async / await nowdays are under-the-hood a complete hybrid mess trying to make callback soup work on something fundamentally different.


    When we call asyncio.run(async_func) - it runs in this order:

    # NOTE: all comment with ^^^^^^ prefix are my comments explaining relevant parts
    
    
    # -----------------------------------------------------------------------
    # asyncio/runners.py
    
    def run(main, *, debug=None, loop_factory=None):
        ...
        
        with Runner(debug=debug, loop_factory=loop_factory) as runner:
            return runner.run(main)
            # ^^^^^ entry point
    
    
    # -----------------------------------------------------------------------
    # asyncio/runners.py
    
    class Runner:
        ...
        
        def run(self, coro, *, context=None):
            """Run a coroutine inside the embedded event loop."""
            
            ...
            
            task = self._loop.create_task(coro, context=context)
            # ^^^^^^ All async funcs are wrapped in task and is immediately pending for execution
    
            if (threading.current_thread() is threading.main_thread()
                and signal.getsignal(signal.SIGINT) is signal.default_int_handler
            ):
                # ^^^^^^ First hint of asyncio's internals being thread based
                
                sigint_handler = functools.partial(self._on_sigint, main_task=task)
            else:
                sigint_handler = None
            
            self._interrupt_count = 0
            try:
                return self._loop.run_until_complete(task)
                # ^^^^^ entry point
            
            except exceptions.CancelledError:
                if self._interrupt_count > 0:
                    uncancel = getattr(task, "uncancel", None)
                    if uncancel is not None and uncancel() == 0:
                        raise KeyboardInterrupt()
                        # ^^^^^^ This is why KeyboardInterrupt raises from task cancelation
                
                raise  # CancelledError
            finally:
                ...
    
    
    # -----------------------------------------------------------------------
    # asyncio/base_events.py
    # This file and class is abstract; Actual Loops depends on OS
    # due to difference event handling method per OS.
    
    
    class BaseEventLoop(events.AbstractEventLoop):
        def run_until_complete(self, future):
            ...
            
            new_task = not futures.isfuture(future)
            future = tasks.ensure_future(future, loop=self)
            ...
            
            future.add_done_callback(_run_until_complete_cb)
            # ^^^^^^ Another hint of callback soup
            
            try:
                self.run_forever()
                # ^^^^^ entry point
            ...
    
        ...
    
        def run_forever(self):
            """Run until stop() is called."""
            ...
            try:
                self._thread_id = threading.get_ident()
                ...
                
                while True:
                    self._run_once()
                    # ^^^^^^ entry point (complex callback & event checks, skipping)
                    
                    if self._stopping:
                        break
            ...
    

    Quite complex, it revolves around the callback and we can't find direct point where task actually get executed - even in self._run_once() it's just Runner class triggering enqueued event listeners.

    No wonder this has so many bugs and stability issues, making us awe on how much effort could've gone into django, flask, fastAPI and etc to make it work with great stability.


    To quote Some thoughts on asynchronous API design in a post-async/await world again from NJS:

    Review and summing up: what is "async/await-native" anyway?

    In previous asynchronous APIs for Python, the use of callback-oriented programming led to the invention of a whole set of conventions that effectively make up an entire ad hoc programming language...

    The result is somewhat analogous to the bad old days before structured programming, where basic constructs like function calls and loops had to be constructed on the fly out of primitive tools like goto.

    In practice, it's extraordinarily difficult to write correct code in this style, especially when one starts to think about edge conditions.

    Now that Python has async/await, it's possible to start using Python's native mechanisms to solve these problems.


    Such limitation of asyncio, and seeing how effective curio was - seemed to lead NJS finalizing concept of Structured Concurrency and created trio, opening path for python a bright future without future!

    (Actual image attached in *Some thoughts on asynchronous API design in a post-async/await world)

    I can't say I'm expert, but from my testing: A simple remote-python-execution-shell server attached to discord (Yes I actually sacrificed my Raspberry Pi here) virtually same code in asyncio died every 1 week, while trio rewrite never died over 3 months.



    Is asyncio trash then


    No, I think that was the best library we had when it came out, and still until we got trio. I believe trio was possible because there was asyncio!

    To quote Some thoughts on asynchronous API design in a post-async/await world again & again from NJS:

    Should asyncio be "fixed" to have a curio-style async/await-native API?

    I can't see how this could be done without substantially throwing out and rewriting most of asyncio. ... but the callback chaining parts are pretty deeply baked into asyncio as it currently exists.

    Seems like it's now too late to rewrite it, also considering it being in standard near decade - but I do wish asyncio stop trying to mimic Structured Concurrency like recent addition of TaskGroup and rather do rewrite or deprecate in favor of other libraries - but since it's not feasible they're adding it to at least improve the current situation I suppose.

    Still serves just fine on many simple usecases anyway!