Search code examples
pythonpython-3.xpython-multithreading

How do I wait for ThreadPoolExecutor.map to finish


I have the following code, which has been simplified:

import concurrent.futures

pool = concurrent.futures.ThreadPoolExecutor(8)

def _exec(x):
    return x + x

myfuturelist = pool.map(_exec,[x for x in range(5)])

# How do I wait for my futures to finish?

for result in myfuturelist:
    # Is this how it's done?
    print(result)

#... stuff that should happen only after myfuturelist is
#completely resolved.
# Documentation says pool.map is asynchronous

The documentation is weak regarding ThreadPoolExecutor.map. Help would be great.

Thanks!


Solution

  • Difference between map and submit

    Executor.map will run jobs in parallel and wait futures to finish, collect results and return a generator. It has done the wait for you. If you set a timeout, it will wait until timeout and throw exception in generator.

    map(func, *iterables, timeout=None, chunksize=1)

    • the iterables are collected immediately rather than lazily;
    • func is executed asynchronously and several calls to func may be made concurrently.

    To get a list of futures and do the wait manually, you can use:

    myfuturelist = [pool.submit(_exec, x) for x in range(5)]
    

    Executor.submit will return a future object, call result on future will explicitly wait for it to finish:

    myfuturelist[0].result() # wait the 1st future to finish and return the result
    

    EDIT 2023-02-24

    Although original answer is accepted, plz check mway's and milkice's. I'll try to add some detail here.

    wait is the better way, and it lets you control how to wait the future by parameter return_when:

    • FIRST_COMPLETED, wait until the first finishes
    • FIRST_EXCEPTION, wait until the first raises exception or all finish
    • ALL_COMPLETED, wait until all finish

    It returns a tuple of finished futures and unfinished ones:

    # wait first one to finish
    finished_set, unfinished_set = wait(myfuturelist, return_when=FIRST_COMPLETED)
    # wait all 
    wait(myfuturelist, return_when=ALL_COMPLETED)
    

    Using with is elegant, but notice that:

    • you don't have access to those return values directly (you can workaround though, for example a nonlocal or global variable)
    • you need to close the pool, which means you can't reuse it to save the cost of thread creation and destroy.