Search code examples
pythonpython-asyncio

Is there a clean way of starting a task execution straight away with asyncio.create_task()?


I have the following code:

import asyncio
import time

async def coro_1(seconds=5):
    await asyncio.sleep(seconds)

async def main_1():
    task_1 = asyncio.create_task(coro_1())
    # Long computation here
    time.sleep(5)
    await task_1

async def main_2():
    task_1 = asyncio.create_task(coro_1())
    await asyncio.sleep(0)
    # Long computation here
    time.sleep(5)
    await task_1

if __name__ == '__main__':
    start = time.time()
    asyncio.run(main_1())
    end = time.time()
    print(f'Main_1 took { end - start } seconds.')
    start = time.time()
    asyncio.run(main_2())
    end = time.time()
    print(f'Main_2 took { end - start } seconds.')

The output:

Main_1 took 10.005882263183594 seconds.
Main_2 took 5.005404233932495 seconds.

I understand that main_1 coro takes longer as the time.sleep() does not happen "concurrently" with the asyncio.sleep(). As far as I understand, this is because the task does not start its execution until the main_1 coro "yields" the execution in the await task_1 sentence.

In main_2 this does not happen because we allowed the start of the task by "yielding" with await asyncio.sleep(0).

Is there a better way of achieving this behaviour? I would like to create a task and have it started straight away without needing an explicit asyncio.sleep(0) so my code runs faster. I feel like adding sleeps all over the place is ugly and adds a lot of boilerplate code.

Any suggestions?


Solution

  • Technically you can fake it being a task:

    import asyncio
    import time
    
    async def coro_1(seconds=5):
        print("c1 start")
        await asyncio.sleep(seconds)
        print("c1 end")
    async def coro_2():
        print("c2 start")
        time.sleep(5)
        print("c2 end")
    
    async def main_1():
        task_1 = asyncio.create_task(coro_1())
        task_2 = asyncio.create_task(coro_2())
        await asyncio.gather(task_1,task_2)
    
    if __name__ == '__main__':
        start = time.time()
        asyncio.run(main_1())
        end = time.time()
        print(f'Main_1 took { end - start } seconds.')
    

    But it's a hack and it's fragile, I'm not sure if anyone guarantees that the tasks get called in their order of creation. Also, the asynchronous one only starts, so it runs up to its first await, and then the synchronous one runs completely. This is more visible with a multi-step example:

    import asyncio
    import time
    
    async def coro_1(label):
        print(f"{label} start")
        for c in range(1,6):
          await asyncio.sleep(1)
          print(f'{label} {c}')
        print(f"{label} end")
    async def coro_2(label):
        print(f"{label} start")
        for c in range(1,6):
          time.sleep(1)
          print(f'{label} {c}')
        print(f"{label} end")
    
    async def main_1():
        task_1 = asyncio.create_task(coro_1("async1"))
        task_2 = asyncio.create_task(coro_1("async2"))
        task_3 = asyncio.create_task(coro_2("heavy"))
        await asyncio.gather(task_1,task_2,task_3)
    
    if __name__ == '__main__':
        start = time.time()
        asyncio.run(main_1())
        end = time.time()
        print(f'Main_1 took { end - start } seconds.')
    

    This one produces the output

    async1 start
    async2 start
    heavy start
    heavy 1            <--
    heavy 2            <--
    heavy 3            <-- no yield, completely synchronous
    heavy 4            <--
    heavy 5            <--
    heavy end
    async1 1
    async2 1
    async1 2
    async2 2
    async1 3
    async2 3
    async1 4
    async2 4
    async1 5
    async1 end
    async2 5
    async2 end
    Main_1 took 9.072734117507935 seconds.
    

    Where you can very well see how the awaits yield, which then result in the first asyncio.sleep(1)s waiting (not running) in parallel with each other and with the synchronous "task". The synchronous task never yields, runs completely, then the remaining 4-4 seconds are waited together. That's what these async-await things generally do (in other languages too), they wait together, but they still run on a single CPU core by default, actually doing one thing at a time.

    Then if you're content with using a single CPU core - so have exactly 1 CPU-intensive computation, you can use to_thread. The heavy function is not async any more, and I replaced time.sleep() with an actual busy loop to show they really overlap:

    import asyncio
    import time
    
    async def coro_1(label):
        print(f"{label} start")
        for c in range(1,6):
          await asyncio.sleep(1)
          print(f'{label} {c}')
        print(f"{label} end")
    def ro_2(label):
        print(f"{label} start")
        for c in range(1,6):
          next = time.time() + 1
          while time.time() < next:
            pass
          print(f'{label} {c}')
        print(f"{label} end")
    
    async def main_1():
        task_1 = asyncio.create_task(coro_1("async1"))
        task_2 = asyncio.create_task(coro_1("async2"))
        task_3 = asyncio.to_thread(ro_2,"heavy")
        await asyncio.gather(task_1,task_2,task_3)
    
    if __name__ == '__main__':
        start = time.time()
        asyncio.run(main_1())
        end = time.time()
        print(f'Main_1 took { end - start } seconds.')
    

    Produces this output:

    async1 start
    async2 start
    heavy start
    heavy 1
    async1 1
    async2 1
    heavy 2
    async1 2
    async2 2
    heavy 3
    async1 3
    async2 3
    heavy 4
    async1 4
    async2 4
    heavy 5
    heavy end
    async1 5
    async1 end
    async2 5
    async2 end
    Main_1 took 5.1116883754730225 seconds.
    

    And then another answer arrived already, so I just stop here.