Search code examples
pythonhdf5point-cloudsh5py

How to combine multiple .h5 files(but same shape) using python?


How to combine multiple .h5 files(but same shape) using python?

I have 10,000 .h5 files for 3D point cloud.

They have same shape.

And I would like to combine(or merge) 2000 files, so I can have total 5 big .h5 files. (such as append() function in python)

I found copy() functions from h5py(http://docs.h5py.org/en/latest/high/group.html).

However, I have not been able to apply that method to my problem.

Please refer to me example codes or help me for solving my problem.

Sorry for my poor English skills.


Solution

  • You can simply do something like this (untested but should work):

    import h5py
    
    def copy(dest, name):
        g = dest.require_group(name)  # create output group with the name of input file
        def callback(name, node):
            if isinstance(node, h5py.Dataset):  # only copy dataset
                g.create(name, data=node[:])
    
    with h5py.File('out.h5', 'w') as h5_out:
        for f_in in files:
            with h5py.File(f_in, 'r') as h5_in:
                    h5_in.visititems(copy(h5_out, f_in))
    

    This would create a "folder" (HDF5 group) for each of the files and copy all contents there, recursively.

    See also: related question.