I need to run dozens of computationally intensive CPU-bound parallel tasks. Currently I do it using joblib
delayed
and Parallel
:
resultTuples = Parallel(n_jobs=-1, prefer="processes")(delayed(RunSingleTask)(*p) for p in run_params)
It works fine but I have to wait until all tasks have finished to get the list of results and process them after. And I'd like to get the result of each finished task right after it is ready to process it.
Tasks have no IO, so I don't see any purpose in using async stuff like:
for first_completed in asyncio.as_completed(tasks):
...
So how can I do this?
You can use ProcessPoolExecutor()
and as_completed()
:
from concurrent.futures import ProcessPoolExecutor, as_completed
def _sum(a, b):
return a + b
if __name__ == '__main__':
inputs = [(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]
with ProcessPoolExecutor() as executor:
fts = [executor.submit(_sum, a, b) for a, b in inputs]
for f in as_completed(fts):
print(f.result())
3
7
11
15
19