I wonder if it is possible in HDF5 to store physical units together with the components of a compound datatype?
To give an example, consider geographic coordinates. A location can be indicated in different ways. Latitude, longitude and radius may be given in degrees and kilometers but specified in radians and meters is just as good. How can I store this information using h5py?
HDF5 does not know about units. Therefore, you have to create an appropriate schema to document the units of your coordinate data. How you do this is really up to you. I can think of at least 3 approaches:
Here is a code sample that demonstrates all 3 methods with a small set of data (10 values of each coordinate). Hope this gives you some ideas.
import h5py
import numpy as np
long_arr = np.random.uniform(-180.,180., 10)
lat_arr = np.random.uniform(-90.,90., 10)
rad_arr = np.random.uniform(6357.,6378., 10)
with h5py.File('SO_65977032.h5', mode='w') as h5f:
# Method 1 creates 1 dataset for each coordinate
# Units are defined with a dataset level attribute
h5f.create_group('Method_1')
h5f['Method_1'].create_dataset('Long', data=long_arr)
h5f['Method_1']['Long'].attrs['Units']='Degrees'
h5f['Method_1'].create_dataset('Lat', data=lat_arr)
h5f['Method_1']['Lat'].attrs['Units']='Degrees'
h5f['Method_1'].create_dataset('Radius', data=rad_arr)
h5f['Method_1']['Radius'].attrs['Units']='km'
# Method 2 creates 1 compound dataset with all 3 coordinates.
# Units are defined with 3 group level attributes AND as part of the field (column) names
h5f.create_group('Method_2')
llr_dt = [ ('Long(Deg)', float), ('Lat(Deg)', float), ('Radius(km)', float) ]
h5f['Method_2'].create_dataset('Coords', dtype=llr_dt, shape=(10,))
h5f['Method_2']['Coords']['Long(Deg)'] = long_arr
h5f['Method_2']['Coords'].attrs['Long Units']='Degrees'
h5f['Method_2']['Coords']['Lat(Deg)'] = lat_arr
h5f['Method_2']['Coords'].attrs['Lat Units']='Degrees'
h5f['Method_2']['Coords']['Radius(km)'] = rad_arr
h5f['Method_2']['Coords'].attrs['Radius Units']='km'
# Method 3 creates 1 compound dataset with all 3 coordinates.
# Units are defined with 3 additional fields (columns) (datatype of strings)
# Attributes are NOT used to save Units.
h5f.create_group('Method_3')
llru_dt = [ ('Long', float), ('Long_units', 'S8'),
('Lat', float), ('Lat_units', 'S8'),
('Radius', float), ('Rad_units', 'S8') ]
h5f['Method_3'].create_dataset('Coords', dtype=llru_dt, shape=(10,))
h5f['Method_3']['Coords']['Long'] = long_arr
h5f['Method_3']['Coords']['Long_units'] = [ 'Degree' for _ in range(10) ]
h5f['Method_3']['Coords']['Lat'] = lat_arr
h5f['Method_3']['Coords']['Lat_units'] = [ 'Degree' for _ in range(10) ]
h5f['Method_3']['Coords']['Radius'] = rad_arr
h5f['Method_3']['Coords']['Rad_units'] = [ 'km' for _ in range(10) ]