Search code examples
pythonpython-3.xtaskpython-asynciocoroutine

When should I use asyncio.create_task?


I am using Python 3.10 and I am a bit confused about asyncio.create_task.

In the following example code, the functions are executed in coroutines whether or not I use asyncio.create_task. It seems that there is no difference.

How can I determine when to use asyncio.create_task and what are the advantages of using asyncio.create_task compared to without it?

import asyncio
from asyncio import sleep

async def process(index: int):
    await sleep(1)
    print('ok:', index)

async def main1():
    tasks = []
    for item in range(10):
        tasks.append(asyncio.create_task(process(item)))
    await asyncio.gather(*tasks)

async def main2():
    tasks = []
    for item in range(10):
        tasks.append(process(item)) # Without asyncio.create_task
    await asyncio.gather(*tasks)

asyncio.run(main1())
asyncio.run(main2())

Solution

  • TL;DR

    It makes sense to use create_task, if you want to schedule the execution of that coroutine immediately, but not necessarily wait for it to finish, instead moving on to something else first.

    Explanation

    As has been pointed out in the comments already, asyncio.gather itself wraps the provided awaitables in tasks, which is why it is essentially redundant to call create_task on them beforehand in your simple example.

    From the gather docs:

    If any awaitable [...] is a coroutine, it is automatically scheduled as a Task.


    That being said, the two examples you constructed are not equivalent!

    When you call create_task, the Task is immediately scheduled for execution on the even loop. This means, if a context switch takes place after you called create_task for all your coroutines (as in your first example), any number of them may immediately start executing, without you having to await them explicitly.

    From the create_task docs: (my emphasis)

    Wrap the [...] coroutine into a Task and schedule its execution.

    By contrast, when you simply create the coroutines (as in your second example), they will not begin execution by themselves, unless you somehow schedule their execution (e.g. by simply awaiting them).

    You can see this in action, if you add any await (e.g. asyncio.sleep) between creation and the gather call and a few helpful print statements:

    from asyncio import create_task, gather, sleep, run
    
    
    async def process(index: int):
        await sleep(.5)
        print('ok:', index)
    
    
    async def create_tasks_then_gather():
        tasks = [create_task(process(item)) for item in range(5)]
        print("tasks scheduled")
        await sleep(2)  # <-- because of this `await` the tasks may begin to execute
        print("now gathering tasks")
        await gather(*tasks)
        print("gathered tasks")
    
    
    async def create_coroutines_then_gather():
        coroutines = [process(item) for item in range(5)]
        print("coroutines created")
        await sleep(2)  # <-- despite this, the coroutines will not begin execution
        print("now gathering coroutines")
        await gather(*coroutines)
        print("gathered coroutines")
    
    
    run(create_tasks_then_gather())
    run(create_coroutines_then_gather())
    

    Output:

    tasks scheduled
    ok: 0
    ok: 1
    ok: 2
    ok: 3
    ok: 4
    now gathering tasks
    gathered tasks
    
    coroutines created
    now gathering coroutines
    ok: 0
    ok: 1
    ok: 2
    ok: 3
    ok: 4
    gathered coroutines
    

    As you can see, in create_tasks_then_gather the process body was executed before the gather call, whereas in create_coroutines_then_gather it was executed only after.

    Therefore, whether or not using create_task is useful depends on the situation. If you only care about the coroutines being executed concurrently and awaited at that particular point in your code, there is no use in calling create_task. If you want to schedule them, but then move on to something else, while they may or may not do their thing in the background, it makes sense to use create_task.

    One important thing to remember however is that you can only ever be sure that the tasks you scheduled actually execute completely, if you at some point await them. This is why you still should await gather them (or equivalent) to actually wait for them to finish eventually.