Search code examples
hdf5

Is there a way to quickly extract specified tables into a different HDF5 file?


Problem that I am trying to solve is the following - I have a long running Python (can take many hours to finish) process that produces up to 80000 HDF5 files. As one of the bottlenecks is constant opening and closing of these files I decided to write a proof-of-concept code that uses a single HDF5 file as output that contains many tables. It certainly helps but I wonder if there is a quick(er) way to export specified tables (with renaming if possible) into a separate file?


Solution

  • Yes, there are at least 3 ways to copy the contents of a dataset from one HDF5 file to another. They include:

    1. h5copy command line utility from The HDF Group. You specify source and destination HDF5 files, along with source and destination objects. Likely this does exactly what you want without a lot of coding.
      Ref: HDF Group: H5Copy docs
    2. h5py module has a copy() method for groups and/or datasets. You input source and destination objects.
    3. pytables module (aka tables) has a copy_node() method. A node is a group and/or a dataset. You input source and destination objects.

    If you choose to use h5py, there are a couple of relevant posts on SO: