Search code examples
pythonhdf5h5py

Changing HDF5 compression filter via h5py


How to read a dataset that is compressed via lzf compression filter and change it to native HDF5 third party filters like szip or zlib? Would simply reading it as shown in How to read HDF5 files in Python, and writing it with compression specified when writing a dataset work?


Solution

  • As @bnaecker said, you can copy the existing dataset and create a new one using a different compression filter. The new dataset can be in the same file or a new one. Note: szip requires special licensing, so I created an example going from lzf to gzip. See example below. The process is the same for any 2 compression filters. Just change compression=value.

    import h5py
    import numpy as np
    
    filename = "SO_64582861.h5"
    
    # Create random data
    
    arr1 = np.random.uniform(-1, 1, size=(10, 3))
    
    # Create intial HDF5 file
    with h5py.File(filename, "w") as h5f:
        h5f.create_dataset("ds_lzf", data=arr1, compression="lzf")
      
    # Re-Open HDF5 file in 'append' mode
    # Copy ds_lzf to ds_gzip with different compression setting
    # could also copy to a second HDF5 file
    with h5py.File(filename, "a") as h5f:
        # List all groups
        print("Keys: %s" % h5f.keys())
        arr2 = h5f["ds_lzf"][:]
        h5f.create_dataset("ds_gzip", data=arr2, compression="gzip")