I'm implementing data collection for a Markov chain Monte Carlo inversion program. However, the MCMC runs can take a week or more to complete! Would it be better to open the file at the beginning of the run:
with h5py.File('my_data.hdf5', 'r+', libver='latest') as fp:
fp.swmr_mode = True
mcmc_run(fp)
Or each time I want to add a dataset (inside mcmc_run()
)
with h5py.File('my_data.hdf5', 'r+', libver='latest') as fp:
fp.swmr_mode = True
fp['dataset'] = new_data
I have to save about 7 mb over 9 datasets for each acceptance (500 total over about a week of computation time, ~5000 iterations). Unfortunately the data is coming from several different objects inside the iteration so I can't group them and open the file once per acceptance.
[Posting comment as an answer]
For runs that take that long, you may want to consider what happens if you have a power outage (as an MC veteran, this is my biggest fear). I recommend closing and re-opening the file because it is probably safer, and less likely to leave the file vulnerable to corruption during a power outage, computer crash, etc. when running over many days.