Search code examples
pythonasynchronousmultiprocessingpython-multiprocessingjoblib

Retrieving results from finished multiprocessing task as it has finished


I need to run dozens of computationally intensive CPU-bound parallel tasks. Currently I do it using joblib delayed and Parallel:

resultTuples = Parallel(n_jobs=-1, prefer="processes")(delayed(RunSingleTask)(*p) for p in run_params)

It works fine but I have to wait until all tasks have finished to get the list of results and process them after. And I'd like to get the result of each finished task right after it is ready to process it.

Tasks have no IO, so I don't see any purpose in using async stuff like:

for first_completed in asyncio.as_completed(tasks):
   ...

So how can I do this?


Solution

  • You can use ProcessPoolExecutor() and as_completed():

    from concurrent.futures import ProcessPoolExecutor, as_completed
    
    
    def _sum(a, b):
        return a + b
    
    
    if __name__ == '__main__':
        inputs = [(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]
    
        with ProcessPoolExecutor() as executor:
            fts = [executor.submit(_sum, a, b) for a, b in inputs]
    
            for f in as_completed(fts):
                print(f.result())
    
    

    Prints

    3
    7
    11
    15
    19