I would like to parallelize a for loop in python.
The loop gets fed by a generator and I expect 1 billion items.
It turned out, that joblib has a giant memory leak
Parallel(n_jobs=num_cores)(delayed(testtm)(tm) for tm in powerset(all_turns))
I do not want to store data in this loop, just print sometimes something out, but the main thread grows in seconds to 1 GB size.
Are there any other frameworks for a large number of iterations?
from multiprocessing import Pool
if __name__ == "__main__":
pool = Pool() # use all available CPUs
for result in pool.imap_unordered(delayed(testtm), powerset(all_turns),
chunksize=1000):
print(result)