Search code examples
python-3.xasynchronousasync-awaitpython-asyncio

How to run CPU bound async function in separate thread with asyncio.to_thread?


I'm making multiple calls to a 3rd party lib using asyncio tasks, wrapped in a function run_task, which is CPU intensive.

I tried using asyncio.to_thread to run it in a separate thread but I am getting errors

Here is a simplified, correct version of my code, using shazamio.

import asyncio
from shazamio import Shazam

async def run_task(shazam):
    ret = await asyncio.to_thread(shazam.recognize_song, '01 - Forest Drive West - Impulse.mp3')
    print(ret)
    return 1 

async def run_all_tasks(iters):
    shazam = Shazam()
    loop = asyncio.get_event_loop()

    coros = [run_task(shazam) for i in range(iters)]
    await asyncio.gather(*coros)
    return 

if __name__ == '__main__':
    asyncio.run(run_all_tasks(10))

With this I get ret is a coroutine, so the await did not work. I also get these warnings

<coroutine object Shazam.recognize_song at 0x11054bf10>
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py:1936: RuntimeWarning: coroutine 'Shazam.recognize_song' was never awaited
  handle = self._ready.popleft()
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

Solution

  • What the error you are getting says is that the .recognize_song itself is already written as a co-routine. (and indeed it is, as can be seen in the docs: https://github.com/dotX12/ShazamIO )

    On the other hand. the .to_thread call expect to receive regular Python functions it will spawn another thread and run there, (and retrieve the return value by using some internal mechanism).

    What is taking place is that .to_thread is called, it calls the co-routine method .recognize_song, which imediatelly returns an awaitable object - .to_thread doesn't check it, and return it as the result of calling the "blockin" method.

    The fix in this case is trivial: just await for the shazam.recognize_song call directly.

    BTW, this is not a "CPU intensive task" locally: what it does is to use the Shazam API to remotely identify your song- it is as I/O bound as it can. The remote API, yes, will do CPU intensive work, but on remote CPUs in the host's datacenter.