Search code examples
python-asyncioweak-referencespython-internals

What's the benefit of asyncio using weakrefs to keep track of tasks?


Python docs for asyncio.create_task state:

Important: Save a reference to the result of this function, to avoid a task disappearing mid-execution. The event loop only keeps weak references to tasks. A task that isn’t referenced elsewhere may get garbage collected at any time, even before it’s done. For reliable “fire-and-forget” background tasks, gather them in a collection:

What's the benefit for asyncio to use a weakref instead of a strong reference to the created task until it is completed? As the warning above suggests, there's definitely at least one benefit to keep strong refs -- which means there's likely an offsetting benefit for weakrefs. Note that with strong refs, asyncio could, upon completion, either remove the reference entirely or switch to weakref, depending on what is required by the asyncio logic.

The use case implicit in the documentation warning is: we don't want to wait for the task (in particular, the task's return value is not used), but we do want to give the task a chance to run eventually.

Note that asyncio.TaskGroup context does keep strong references to the tasks it creates. However, it's not suitable for the above use case, since upon exit, the context blocks (it waits for all the tasks' completion). (Also, it keeps a strong reference to the task even after it completes, so it will cause memory leaks if the context lives much longer than the tasks.)


Solution

  • Sorry if not really an answer to the "why" - I think just whoever actually implemented it can come up with the real motivation - if there is any besides what you quote in the question.

    On a second though, one can't even check if a weakrefered object has any hard-references until it is about to be switched into and executed: at task-creation, the only hard reference to it is returned to the caller. So, once the decision was taken not to hold hard-references to tasks due to fear of resource-leakage, the "bomb" was out. And it was likely only perceived later, but them making it hold a hard reference would probably change too much of the working behavior for certain workloads.

    update As a drawback to holding a hard-reference: if the tasks are created like "fire and forget", eventually done tasks would carry their result (or exception) data, and could not simply be "dropped": they'd need to be kept indefinitely, and that could be a major resource leak for a server-type app. On the other hand, were to "draw the line" that a complete task should be considered "orphan" and could be discarded? So, it appears the "line" chosen by the implementation, even if not that well though, is the task having a hard reference elsewhere.

    In the source code, when a task is created, it is immediately registered as ready in an internal running loop structure - that is a hard reference - when the loop cycles through on iteration and fails to call any task there, this hard reference is dropped. That this "dropping" is not deterministic maybe is a bug that could be fixed. /update

    Maybe now the problem is getting some awareness, beyond this notice, (I've seem a twitter thread about it last week as well, with high-profile Python folks worried about the behavior), they take the step for an incompatible change there - or come up with a create_task sucessor that does the right thing. TaskGroups, as you put it, are not quite it - they are more substitutes for asyncio.gather - and even there with some drawbacks (the whole group is cancelled at __exit__ if a single task throws an exception, with no practical workaround)

    For anecdote, I experimented around, and the results are really nasty with random task-drops starting at around 2500 concurrent tasks with this code:

    import asyncio, random
    
    async def blah(n):
        await asyncio.sleep(random.random())
        results.add(n)
    
    async def main(m):
        for i in range(m):
            asyncio.create_task(blah(i))
        await asyncio.sleep(1.01)
    
    def doit(m):
        global results
        results = set()
        asyncio.run(main(m))
        try:
            assert list(range(m)) == list(results)
        except AssertionError:
            print(f"For {m} tasks, missing {m - len(results)} results: {set(range(m)) - results}")
    
    

    Including an await asyncio.sleep(0) in the loop where the tasks are created makes it all run flawlessly up to millions of tasks.