I am using h5py to store experiment data in an HDF5 container.
In an interactive session I open the file using:
measurement_data = h5py.File('example.hdf5', 'a')
Then I write data to the file using some self-written functions (can be many GB of data from a couple of days experiment). At the end of the experiment I usually would close the file using
measurement_data.close()
Unfortunately, from time to time it happens, that the interactive session ends without me explicitly closing the file (accidentally killing the session, power outage, crash of OS due to some other software). This always results in a corrupt file and loss of the complete data. When I try to open it, I get the error:
OSError: Unable to open file (File signature not found)
I also cannot open the file in HDFview, or any other software I tried.
Always opening and closing the file for every write access sounds pretty unfavorable to me, because I am continuously writing data from many different functions and threads. So I'd be more happy with a different solution.
The corruption problem is known to the HDF5 designers. They are working on fixing this in version 1.10 by adding journalling. In the mean time you can call flush()
periodically to make sure your writes have been flushed, which should minimise some of the damage. You can also try to use external links which will allow you to store pieces of data in separate files but link them together into one structure when you read them.