I am using the following syntax to overwrite part of an hdf5 file in Python:
import h5py
f = h5py.File(file_path, 'r')
dset = f["mykey"]
dset[:3] = [1,2,3]
f.close()
It seems to be working but I could not find information in the documentation about how this update is made. I am wondering if the dataset is (1) loaded in memory, (2) updated, (3) entirely written back, or if it just updates the piece of data on disk.
I am asking this because I want to recode it for npy files and I have the choice between loading the data, updating it and rewriting it or just using seek and making only the necessary update on disk.
So have you studied the h5py
docs, especially the page about datasets? It's all there.
Here's what I've deduced from reading those docs and answering a variety of SO.
f = h5py.File(file_path, 'r')
dset = f["mykey"]
dset
is the dataset object, that's located on the file.
arr = dset[:]
would load the dataset into a numpy array.
dset[:3] = [1,2,3]
this on the other hand, writes np.array([1,2,3])
to the dataset on the file; that is, it will modify the first 3 elements of the file object.
f.close()
Due to buffering etc, that write might not actually happen until the f
is flushed
or closed
.
Since it is possible to load just a portion of the dataset
arr = dset[:3]
I deduce it can perform the write without loading the whole dset
. The actual code is a mix of python, c++, with cython
as the bridge.