Search code examples
pythonnumpyconcatenationhdf5h5py

Custom column names in HDF5 file using h5py


I have the following code snippet:

import h5py
import numpy

## Data set with shape (5, 5) and numpy array containing column names as string
data = numpy.random.random((5, 5))
column_names = numpy.array(["a", "b", "c", "d", "e"])

## Create file pointer
fp = h5py.File("data_set.HDF5", "w")

## Store data
fp["sub"] = data

## Close file
fp.close()

How do I add the names for the columns in the HDF5 file as indicated by the arrow in the included figure?

enter image description here


Solution

  • The trick is to use a Numpy dtype to define the field/column names, then use it to define a record array. You can also mix variable types (say if you want to mix ints, floats and strings on the same line).

    Modified example below:

    import h5py
    import numpy as np
    
    ## Data set with shape (5, 5) and list containing column names as string
    data = np.random.rand(5, 5)
    col_names = ["a", "b", "c", "d", "e"]
    ## Create file pointer
    with h5py.File("data_set_2.HDF5", "w") as fp :
        ds_dt = np.dtype( { 'names':col_names,
                            'formats':[ (float), (float), (float), (float), (float)] } )
        rec_arr = np.rec.array(data,dtype=ds_dt)        
        ## Store data
        ##fp["sub"] = data
        ds1 = fp.create_dataset('sub', data=rec_arr )