Search code examples
pythonpython-3.xpython-asynciothreadpoolexecutor

Using Threadpool in an Async method without run_in_executor of asyncio.get_event_loop()


Folllowing is my code, which runs a long IO operation from an Async method using Thread Pool from Concurrent.Futures Package as follows:

# io_bound/threaded.py
import concurrent.futures as futures
import requests
import threading
import time
import asyncio

data = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]

def sleepy(n):
    time.sleep(n//2)
    return n*2

async def ExecuteSleep():
    l = len(data)
    results = []
    # Submit needs explicit mapping of I/p and O/p
    # Output may not be in the same Order
    with futures.ThreadPoolExecutor(max_workers=l) as executor:
        result_futures = {d:executor.submit(sleepy,d) for d in data}
        results = {d:result_futures[d].result() for d in data}
    
    return results

if __name__ == '__main__':
    print("Starting ...")
    t1 = time.time()
    result  = asyncio.run(ExecuteSleep())
    print(result)
    print("Finished ...")
    t2 = time.time()
    print(t2-t1)

Following is my question:

  1. What could be the potential issue if I run the Threadpool directly without using the following asyncio apis:

    loop = asyncio.get_event_loop()
    loop.run_in_executor(...)
    

I have reviewed the docs, ran simple test cases to me this looks perfectly fine and it will run the IO operation in the Background using the Custom thread pool, as listed here, I surely can't use pure async await to receive the Output and have to manage calls using map or submit methods, beside that I don't see a negative here.

Ideone link of my code https://ideone.com/lDVLFh


Solution

  • What could be the potential issue if I run the Threadpool directly

    There is no issue if you just submit stuff to your thread pool and never interact with it or wait for results. But your code does wait for results¹.

    The issue is that ExecuteSleep is blocking. Although it's defined as async def, it is async in name only because it doesn't await anything. While it runs, no other asyncio coroutines can run, so it defeats the main benefit of asyncio, which is running multiple coroutines concurrently.


    ¹ Even if you remove the call to `result()`, the `with` statement will wait for the workers to finish their jobs in order to be able to terminate them. If you wanted the sync functions to run completely in the background, you could make the pool global and not use `with` to manage it.