Search code examples
hdf5h5py

Writing variable-length sequence to a compound array


I am using compound datatypes with h5py, with some elements being variable-length arrays. I can't find a way to set the item. The following MWE shows 6 various ways to do that (sequential indexing — which would not work in h5py anyway, fused indexing, read-modify-commit for columns/rows), neither of which works.

What is the correct way? Why is h5py saying Cannot change data-type for object array when writing integer list to int32 list?

with h5py.File('/tmp/test-vla.h5','w') as h5:
    dt=np.dtype([('a',h5py.vlen_dtype(np.dtype('int32')))])
    dset=h5.create_dataset('test',(5,),dtype=dt)
    dset['a'][2]=[1,2,3] # does not write the value back
    dset[2]['a']=[1,2,3] # does not write the value back
    dset['a',2]=[1,2,3]  # Cannot change data-type for object array
    dset[2,'a']=[1,2,3]  # Cannot change data-type for object array
    tmp=dset['a']; tmp[2]=[1,2,3]; dset['a']=tmp # Cannot change data-type for object array
    tmp=dset[2]; tmp['a']=[1,2,3]; dset[2]=tmp # 'list' object has no attribute 'dtype'

Solution

  • When working with compound datasets, I've discovered it's best to add all row data in a single statement. I tweaked your code and to show how add 3 rows of data (each of different length). Note how I: 1) define the row of data with a tuple; 2) define the list of integers with np.array(); and 3) don't reference the field name ['a'].

    with h5py.File('test-vla.h5','w') as h5:
        dt=np.dtype([('a',h5py.vlen_dtype(np.dtype('int32')))])
        dset=h5.create_dataset('test',(5,),dtype=dt)
        print (dset.dtype, dset.shape)
        dset[0] = ( np.array([0,1,2]), )
        dset[1] = ( np.array([1,2,3,4]), )
        dset[2] = ( np.array([0,1,2,3,4]), )
    

    For more info, take a look at this post on the HDF Group Forum under HDF5 Ancillary Tools / h5py:
    Compound datatype with int, float and array of floats