Search code examples
hdf5h5py

Compound types with units


I wonder if it is possible in HDF5 to store physical units together with the components of a compound datatype?

To give an example, consider geographic coordinates. A location can be indicated in different ways. Latitude, longitude and radius may be given in degrees and kilometers but specified in radians and meters is just as good. How can I store this information using h5py?


Solution

  • HDF5 does not know about units. Therefore, you have to create an appropriate schema to document the units of your coordinate data. How you do this is really up to you. I can think of at least 3 approaches:

    1. Method 1 creates 1 dataset for each coordinate
      -- Units are defined with a dataset level attribute
    2. Method 2 creates 1 compound dataset with all 3 coordinates.
      -- Units are defined with 3 group level attributes AND as part of the field (column) names
    3. Method 3 creates 1 compound dataset with all 3 coordinates.
      -- Units are defined with 3 additional fields (columns) (datatype of strings)
      -- Attributes are NOT used to save Units.

    Here is a code sample that demonstrates all 3 methods with a small set of data (10 values of each coordinate). Hope this gives you some ideas.

    import h5py
    import numpy as np
    
    long_arr = np.random.uniform(-180.,180., 10)
    lat_arr  = np.random.uniform(-90.,90., 10)
    rad_arr  = np.random.uniform(6357.,6378., 10)
    
    with h5py.File('SO_65977032.h5', mode='w') as h5f:
    
    # Method 1 creates 1 dataset for each coordinate
    # Units are defined with a dataset level attribute  
        h5f.create_group('Method_1')
        
        h5f['Method_1'].create_dataset('Long', data=long_arr)
        h5f['Method_1']['Long'].attrs['Units']='Degrees'
    
        h5f['Method_1'].create_dataset('Lat', data=lat_arr)
        h5f['Method_1']['Lat'].attrs['Units']='Degrees'
    
        h5f['Method_1'].create_dataset('Radius', data=rad_arr)
        h5f['Method_1']['Radius'].attrs['Units']='km'
    
    # Method 2 creates 1 compound dataset with all 3 coordinates.
    # Units are defined with 3 group level attributes AND as part of the field (column) names    
        h5f.create_group('Method_2')
    
        llr_dt = [ ('Long(Deg)', float), ('Lat(Deg)', float), ('Radius(km)', float)  ]
    
        h5f['Method_2'].create_dataset('Coords', dtype=llr_dt, shape=(10,))
        h5f['Method_2']['Coords']['Long(Deg)'] = long_arr
        h5f['Method_2']['Coords'].attrs['Long Units']='Degrees'
    
        h5f['Method_2']['Coords']['Lat(Deg)'] = lat_arr
        h5f['Method_2']['Coords'].attrs['Lat Units']='Degrees'
    
        h5f['Method_2']['Coords']['Radius(km)'] = rad_arr   
        h5f['Method_2']['Coords'].attrs['Radius Units']='km'
        
    # Method 3 creates 1 compound dataset with all 3 coordinates.
    # Units are defined with 3 additional fields (columns) (datatype of strings)
    # Attributes are NOT used to save Units.
        h5f.create_group('Method_3')
    
        llru_dt = [ ('Long', float),  ('Long_units', 'S8'), 
                    ('Lat', float),   ('Lat_units', 'S8'),
                    ('Radius', float), ('Rad_units', 'S8')  ]
    
        h5f['Method_3'].create_dataset('Coords', dtype=llru_dt, shape=(10,))
        h5f['Method_3']['Coords']['Long'] = long_arr
        h5f['Method_3']['Coords']['Long_units'] = [ 'Degree' for _ in range(10) ]
    
        h5f['Method_3']['Coords']['Lat'] = lat_arr
        h5f['Method_3']['Coords']['Lat_units'] = [ 'Degree' for _ in range(10) ]
    
        h5f['Method_3']['Coords']['Radius'] = rad_arr
        h5f['Method_3']['Coords']['Rad_units'] = [ 'km' for _ in range(10) ]