Search code examples
hdf5h5py

HDF5 `h5repack` Error ocurred while repacking


I'm running the following h5repack command on a new HDF5 file. I've repacked files in the past without any problem, but this HDF5 file is not cooperating. Possibly it's due to the grouping structure, I access the table I'm interested in via h5py.File(filepath)['recordings/rec0000/well000/groups/routed/raw']. In this related question I confirmed the way to reference the group structure on the command line. However, I'm still getting the error seen below.

$ h5repack -v -f /recordings/rec0000/well000/groups/routed/raw:NONE -i 230208_c10984.h5 -o 230208_c10984.uncompressed.h5
No all objects to modify layout
No all objects to apply filter
 </recordings/rec0000/well000/groups/routed/raw> with NONE filter
Opening file. Searching 52 objects to modify ...
 </recordings/rec0000/well000/groups/routed/raw>Error occurred while repacking

Here's the (relevant?) part of h5dump -pH 230208_c10984.h5

               DATASET "raw" {
                  DATATYPE  H5T_STD_U16LE
                  DATASPACE  SIMPLE { ( 880, 6000000 ) / ( 880, H5S_UNLIMITED ) }
                  STORAGE_LAYOUT {
                     CHUNKED ( 880, 200 )
                     SIZE 1449463146 (7.285:1 COMPRESSION)
                  }
                  FILTERS {
                     USER_DEFINED_FILTER {
                        FILTER_ID 401
                        COMMENT mxw-data
                     }
                     COMPRESSION DEFLATE { LEVEL 0 }
                  }
                  FILLVALUE {
                     FILL_TIME H5D_FILL_TIME_IFSET
                     VALUE  H5D_FILL_VALUE_DEFAULT
                  }
                  ALLOCATION_TIME {
                     H5D_ALLOC_TIME_INCR
                  }
               }

I also tried to change the --layout=/recordings/rec0000/well0000/groups/routed/raw:CHUNKED=1x30000 with a similar error.

I verify here that I can open and access the dataset in h5py:

import h5py

>>> h5py.File('230208_c10984.h5')['recordings/rec0000/well000/groups/routed/raw']
<HDF5 dataset "raw": shape (880, 6000000), type "<u2">

Solution

  • This is my 2nd answer. It addresses the custom filter in the h5repack output.

    The h5repack output for FILTERS {} shows FILTER_ID 401. This is a custom compression filter. I checked the Registered Filter Plugins at The HDF Group, and there isn't a 401 listed. I'm pretty sure h5repack doesn't know about this filter. You can check my assumption by trying to print the data in the dataset. Modify your h5dump command to: :

    h5dump -d /recordings/rec0000/well000/groups/routed/raw -s=1 -c=10 230208_c10984.h5
    Note: -d defines dataset name, -s and -c limit the output to a small slice
    

    If you get an error printing the data, I'm 99% sure the problem is a missing filter. You can check to see which HDF5 plugins are installed in this folder: $HDF5_HOME/lib/plugin.

    You can also check filter behavior with h5py by getting the .compression attribute for the dataset. The 1st line prints compression filter name and the 2nd line prints the 1st 10 data values. That will test to see if h5py has the compression filter installed.

    with h5py.File(filepath) as h5f:
        ds = h5f['recordings/rec0000/well000/groups/routed/raw']
        print(ds.name, ds.compression)
        print(ds[0:10,0])
    

    I don't know if this will solve your problem, but should get you pointed in the right direction. If the problem is the filter, you need to figure out what filter ID 401 is, then how to get the libraries and install them in the plugin folder so h5repack can access them. Good luck!