How does the writing process work in h5py Datasets?

I am using the following syntax to overwrite part of an hdf5 file in Python:

import h5py

f = h5py.File(file_path, 'r')
dset = f["mykey"]
dset[:3] = [1,2,3]
f.close()

It seems to be working but I could not find information in the documentation about how this update is made. I am wondering if the dataset is (1) loaded in memory, (2) updated, (3) entirely written back, or if it just updates the piece of data on disk.

I am asking this because I want to recode it for npy files and I have the choice between loading the data, updating it and rewriting it or just using seek and making only the necessary update on disk.

Solution

So have you studied the h5py docs, especially the page about datasets? It's all there.

Here's what I've deduced from reading those docs and answering a variety of SO.

f = h5py.File(file_path, 'r')
dset = f["mykey"]

dset is the dataset object, that's located on the file.

arr = dset[:]

would load the dataset into a numpy array.

dset[:3] = [1,2,3]

this on the other hand, writes np.array([1,2,3]) to the dataset on the file; that is, it will modify the first 3 elements of the file object.

f.close()

Due to buffering etc, that write might not actually happen until the f is flushed or closed.

Since it is possible to load just a portion of the dataset

arr = dset[:3]

I deduce it can perform the write without loading the whole dset. The actual code is a mix of python, c++, with cython as the bridge.