Search code examples
hdf5h5pyhdf

create hdf5 compound dataset with field/column labels


I have a problem creating the hdf5 file schema like the image below. I can make the first level grouping without a problem. And I can make the traces and rge labels datasets but I have a really hard time recreating the metadata because it contains structures and I have no idea how to attach those to the dataset. The best would be if there was a solution in python. ASCAD datastructure Thanks for any help.

I wrote the code below because I thought first that those structures can be put into the dataset and then labeled. This throws an error but also I think I went completely the wrong way about it because I have no idea how to make it structures. dset = grp.create_dataset("traces", data=arr)

arr2=np.array(mat['plaintext'], dtype=int)
arr3=  np.array([43,126,21,22,40,174,210,166,171,247,21,136,9,207,79,60])
dat = []
for i in arr2:
    met = [i ,arr3]
    dat.append(met)
dati = np.array(dat)
print(dat)
deset2 = grp.create_dataset("metadata", data=dati,chunks=True)
deset2.name
'/measurements/metadata'
f['/measurements/metadata'].dims[0]='plaintext'
f['/measurements/metadata'].dims[1] = 'key'
f.close()

Solution

  • They key is creating an appopriate numpy dtype. You can use it to create an empty dataset, then add the data in another step. Or, you can use it to create a numpy recarray, populate the array with data, then create the dataset and load the data in 1 step (the dataset shape and dtype are the same as the recarray).

    I wrote my answer based on my interpretation of the metadata\metadata_i dataset dtype from the image.

    This code creates 3 empty datasets with 10 rows.

    with h5py.File('SO_75792317_empty.h5','w') as h5f:
        grp = h5f.create_group('Attack_traces/metadata')
        for i in range(1,4):
           meta_dt = np.dtype([(f'plaintext_{i}','S16'), (f'key_{i}','S16'),
                            (f'ciphertext_{i}','S16'), (f'masks_{i}','S16'),
                            (f'desync_{i}','S1') ])                      
           grp.create_dataset(f'metadata_{i}', shape=(10,), dtype=meta_dt, chunks=True)
    

    This code creates a numpy recarray (populated with 10 rows of data) and uses it create each metadata_i dataset:

    with h5py.File('SO_75792317_data.h5','w') as h5f:
        grp = h5f.create_group('Attack_traces/metadata')
        for i in range(1,4):
           meta_dt = np.dtype([(f'plaintext_{i}','S16'), (f'key_{i}','S16'),
                            (f'ciphertext_{i}','S16'), (f'masks_{i}','S16'),
                            (f'desync_{i}','S1') ])
           rec_arr = np.empty(shape=(10,), dtype=meta_dt)
           rec_arr[f'plaintext_{i}'] = [ f'ptext_{i}_row_{j}' for j in range(1,11) ]
           rec_arr[f'key_{i}'] = [ f'key_{i}_row_{j}' for j in range(1,11) ]
           rec_arr[f'ciphertext_{i}'] = [ f'ctext_{i}_row_{j}' for j in range(1,11) ]
           rec_arr[f'masks_{i}'] = [ f'masks_{i}_row_{j}' for j in range(1,11) ]
           rec_arr[f'desync_{i}'] = [ 'T' for j in range(1,11) ]
    
           grp.create_dataset(f'metadata_{i}', data=rec_arr, chunks=True)
    

    If my interpretation of the metadata dtype is not correct, you can get it with the following code, then add it to your original question:

    with h5py.File('ASCAD.h5') as h5f:
        print(h5f['Attack_traces/metadata/metadata_1'].dtype)