Is there a way for an HDF5 dataset to start small in size and automatically be resized to fit more and more items as they are appended into it?
I know that using h5py I can start small and make the dataset "unlimited" in size like so:
dset = file.create_dataset("my_dataset", (1024,), maxshape=(None,))
But AFAIK I still have to resize()
the dataset as it nears its current capacity (1024 initially in the above example).
Is there a way for me not to have to resize()
explicitly in my code?
Short answer: No.
I'm not an expert on the underlying HDF5 libraries, but I don't think they have this capability (and h5py is simply a wrapper). The (sort of) good news: h5py will throw an exception if you try to write beyond the allocated size.
Code below expands on your example to demonstrate.
with h5py.File('SO_68389770.h5','w') as h5f:
dset = h5f.create_dataset("my_dataset", (1024,), maxshape=(None,))
size = 100
for i in range(10):
arr = np.random.random(size)
start, end = i*size, i*size+size
dset[start:end] = arr
This works with range(10)
. You will get this error for range(11)
:
TypeError: Can't broadcast (100,) -> (24,)
Code below handles any size cleanly by checking dset.shape[0]
before writing.
with h5py.File('SO_68389770.h5','w') as h5f:
dset = h5f.create_dataset("my_dataset", (1024,), maxshape=(None,))
size = 100
for i in range(13):
arr = np.random.random(size)
start, end = i*size, i*size+size
if dset.shape[0] >= end :
dset[start:end] = arr
else:
print(f'insufficient dset size, end={end}; resizing')
dset.resize(end,axis=0)
dset[start:end] = arr