Search code examples
pythonhdf5h5py

Copy all datasets in a key in HDF file


I am trying to merge two different HDF5 files where 3 key groups need to be extracted from one file and another key group needs to be extracted from a different file. I am writing a script to access a single key group but I am not sure if I have the right syntax to access the datasets in the key group.

with hp.File('destFile.h5','w') as f_dest:
    with hp.File('test.h5','r') as f_src:
        for members in list(f1.get('EnvObjects'))[1:]:     
        
    f_src.copy(f_src[EnvObjects/members],f_dest[EnvObjects/members],"EnvObjects)

The error I get is that the EnvObjects doesn't exist whereas it does exist. Please let me know what the right syntax here is for accessing and copying specific key groups from one HDF to another.


Solution

  • The default behavior for h5py group/object .copy() method (with Group objects) is super helpful in your situation. From the h5py documentation: "If the source is a Group object, by default all objects within that group will be copied recursively."

    So, you can copy all the datasets without looping over their names. Note: this will recursively copy all objects in group 'EnvObjects'. More details on this behavior follows the example below:

    with h5py.File('test.h5', 'r') as h5src, \
         h5py.File('destFile2.h5', 'w') as h5dest:
    
         h5src.copy(h5src['EnvObjects'], h5dest, 'EnvObjects')
    

    FYI, the code above shows how to open 2 files with 1 with/as: context manager. It's a little bit cleaner in your situation (and eliminates 1 indentation level too!).

    There are times you may not want to recursively copy all objects in a group. For example, if the group has datasets and sub-groups, and you only want to copy the datasets. You can set the shallow=True parameter and only copy immediate members of the group. However, that will copy datasets and sub-groups, but not recursively copy objects in the sub-groups (so "kinda messy").

    Your method is handy for that situation (to only copy datasets). However, it needs a test to check for dataset objects before copying. Code below shows how to do that:

    with h5py.File('test.h5', 'r') as h5src, \
         h5py.File('destFile.h5', 'w') as h5dest:
    
        for member in h5src['EnvObjects']:
            if isinstance(h5src[f'EnvObjects/{member}'], h5py.Dataset) :       
                h5src.copy(h5src[f'EnvObjects/{member}'], h5dest,f'EnvObjects/{member}')