Search code examples
pythondatasethdf5h5py

How to combine multiple hdf5 files into one file and dataset?


import h5py
import numpy as np

with h5py.File("myCardiac.hdf5", "w") as f:
    dset = f.create_dataset("mydataset", (100,), dtype = 'i')
    grp = f.create_group("G:/Brain Data/Brain_calgary/")

I tried this code to create a hdf5 file. There are 50 hhdf5 files in a folder. I want to combine all 50 hdf5 files into one hdf5 file dataset.


Solution

  • To merge 50 .h5 files, each with a dataset named kspace and the form (24, 170, 218, 256), into one large dataset, use this code:

    import h5py
    import os
    
    with h5py.File("myCardiac.hdf5", "w") as f_dst:
        h5files = [f for f in os.listdir() if f.endswith(".h5")]
    
        dset = f_dst.create_dataset("mydataset", shape=(len(h5files), 24, 170, 218, 256), dtype='f4')
    
        for i, filename in enumerate(h5files):
            with h5py.File(filename) as f_src:
                dset[i] = f_src["kspace"]
    

    Detailed description

    Firstly, you must create a destination file myCardiac.hdf5. Then get the list of all .h5 files in the directory:

    h5files = [f for f in os.listdir() if f.endswith(".h5")]
    

    NOTE: os.listdir() without arguments gets list of files/foldes in the current working directory. I expect this python script to be in the same directory as the files and the CWD will be set to this directory.

    The next step is to create a dataset in the destination file with the desired size and data type:

    dset = f_dst.create_dataset("mydataset", shape=(len(h5files), 24, 170, 218, 256), dtype='f4')
    

    You can then iteratively copy the data from the source files to the target dataset.

    for i, filename in enumerate(h5files):
        with h5py.File(filename) as f_src:
            dset[i] = f_src["kspace"]