I am running a parallelized code in Python and I am trying to save some values within each iteration. My code could be simplified/summarized as follows:
# Import necessary libraries
def func(a,b):
# Generate some data and save it into "vector".
# Create a Hdf5 file and save data in vector.
with h5py.File('/some_file.hdf5', 'w') as f:
f.create_dataset('data_set', data=vector)
# Some code
# Parallelize func
if __name__ == '__main__':
with mp.Pool(2) as p:
[p.apply_async(func, args=(elem, b)) for elem in big_array]
I am saving the files while parallelizing to save memory, since I will be working with big amounts of data.
However, every time I run the script, no hdf5 file is generated and the data is not saved.
I am pretty new to Parallelization with Python and I do not understand what the problem is.
In the end I changed the with
command (last two lines) by the following:
p = mp.Pool(2)
result = [p.apply_async(func, args=(elem, b)) for elem in big_array]
p.close()
p.join()
and it worked!
It seems the previous code, with the with
command, basically leaves the for
loop when the tasks are assigned to each processor and leaves the loop before all calculations are done.