Search code examples
pythonpandash5pyhdf

OSError: Unable to open file (file signature not found) / End of HDF5 error back trace


I have a small (< 6Mb) .hdf file (obtained from the LAADS DAAC service). I have tried pandas and h5py to open it, to no avail (code shown below). I also tested the file with:

$ h5dump -n data.hdf
h5dump error: unable to open file "data.hdf"

and

$ h5debug data.hdf  
cannot open file

This would all seem to point to a corrupt file, but the weird thing is that HDF View (v2.11) has absolutely no issues opening the same file.

What is going on here?


1.

import h5py
data = h5py.File(filename, 'r')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/h5py/_hl/files.py", line 394, in __init__
    swmr=swmr)
  File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/h5py/_hl/files.py", line 170, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 85, in h5py.h5f.open
OSError: Unable to open file (file signature not found)

2.

import pandas as pd
data = pd.io.pytables.read_hdf(filename)

Traceback (most recent call last):
  File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/pandas/io/pytables.py", line 603, in open
    self._handle = tables.open_file(self._path, self._mode, **kwargs)
  File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/tables/file.py", line 320, in open_file
    return File(filename, mode, title, root_uep, filters, **kwargs)
  File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/tables/file.py", line 784, in __init__
    self._g_new(filename, mode, **params)
  File "tables/hdf5extension.pyx", line 492, in tables.hdf5extension.File._g_new
tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "H5F.c", line 511, in H5Fopen
    unable to open file
  File "H5Fint.c", line 1604, in H5F_open
    unable to read superblock
  File "H5Fsuper.c", line 413, in H5F__super_read
    file signature not found

End of HDF5 error back trace

Unable to open/create file 'data.hdf'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/pandas/io/pytables.py", line 368, in read_hdf
    store = HDFStore(path_or_buf, mode=mode, **kwargs)
  File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/pandas/io/pytables.py", line 488, in __init__
    self.open(mode=mode, **kwargs)
  File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/pandas/io/pytables.py", line 636, in open
    raise IOError(str(e))
OSError: HDF5 error back trace

  File "H5F.c", line 511, in H5Fopen
    unable to open file
  File "H5Fint.c", line 1604, in H5F_open
    unable to read superblock
  File "H5Fsuper.c", line 413, in H5F__super_read
    file signature not found

End of HDF5 error back trace

Unable to open/create file 'data.hdf'

3.

import pandas as pd
data = pd.HDFStore(filename, mode='r')

Traceback (most recent call last):
  File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/pandas/io/pytables.py", line 603, in open
    self._handle = tables.open_file(self._path, self._mode, **kwargs)
  File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/tables/file.py", line 320, in open_file
    return File(filename, mode, title, root_uep, filters, **kwargs)
  File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/tables/file.py", line 784, in __init__
    self._g_new(filename, mode, **params)
  File "tables/hdf5extension.pyx", line 492, in tables.hdf5extension.File._g_new
tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "H5F.c", line 511, in H5Fopen
    unable to open file
  File "H5Fint.c", line 1604, in H5F_open
    unable to read superblock
  File "H5Fsuper.c", line 413, in H5F__super_read
    file signature not found

End of HDF5 error back trace

Unable to open/create file 'data.hdf'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/pandas/io/pytables.py", line 488, in __init__
    self.open(mode=mode, **kwargs)
  File "/home/gabriel/miniconda3/envs/py3/lib/python3.7/site-packages/pandas/io/pytables.py", line 636, in open
    raise IOError(str(e))
OSError: HDF5 error back trace

  File "H5F.c", line 511, in H5Fopen
    unable to open file
  File "H5Fint.c", line 1604, in H5F_open
    unable to read superblock
  File "H5Fsuper.c", line 413, in H5F__super_read
    file signature not found

End of HDF5 error back trace

Unable to open/create file 'data.hdf'

Solution

  • I can reproduce the error message with:

    In [88]: h5py.File('echo.py','r')                                                              
    ---------------------------------------------------------------------------
    OSError                                   Traceback (most recent call last)
    <ipython-input-88-4c05cde6b6ff> in <module>
    ----> 1 h5py.File('echo.py','r')
    
    /usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py in __init__(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, **kwds)
        406                 fid = make_fid(name, mode, userblock_size,
        407                                fapl, fcpl=make_fcpl(track_order=track_order),
    --> 408                                swmr=swmr)
        409 
        410             if isinstance(libver, tuple):
    
    /usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
        171         if swmr and swmr_support:
        172             flags |= h5f.ACC_SWMR_READ
    --> 173         fid = h5f.open(name, flags, fapl=fapl)
        174     elif mode == 'r+':
        175         fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    h5py/_objects.pyx in h5py._objects.with_phil.wrapper()
    
    h5py/h5f.pyx in h5py.h5f.open()
    
    OSError: Unable to open file (file signature not found)
    

    With the downloaded link (a 5M file):

    1614:~/mypy$ h5debug ../Downloads/data.hdf 
    HDF5-DIAG: Error detected in HDF5 (1.10.0-patch1) thread 139633948224384:
      #000: ../../../src/H5F.c line 579 in H5Fopen(): unable to open file
        major: File accessibilty
        minor: Unable to open file
      #001: ../../../src/H5Fint.c line 1208 in H5F_open(): unable to read superblock
        major: File accessibilty
        minor: Read failed
      #002: ../../../src/H5Fsuper.c line 273 in H5F__super_read(): file signature not found
        major: File accessibilty
        minor: Not an HDF5 file
    cannot open file
    

    Looks like the file is HDF4, not 5.

    h5fromh4 -v ../Downloads/data.hdf 
    

    makes a data.h5 file with one dataset "data"

    In [3]: f = h5py.File('../Downloads/data.h5','r')                                              
    In [4]: f                                                                                      
    Out[4]: <HDF5 file "data.h5" (mode r+)>
    In [5]: list(f.keys())                                                                         
    Out[5]: ['data']
    In [9]: f['data']                                                                              
    Out[9]: <HDF5 dataset "data": shape (680, 451), type "<f8">
    

    In hdfview I see the file is HDFEOS_V2.19

    With pyhdf (and relevant HDF4 libraries) I can:

    In [3]: from pyhdf.SD import SD, SDC                                                           
    In [5]: f = SD('../Downloads/data.hdf', SDC.READ)                                              
    In [6]: f.datasets()                                                                           
    Out[6]: 
    {'Longitude': (('Cell_Along_Swath:mod04', 'Cell_Across_Swath:mod04'),
      (680, 451),
      5,
      0),
    ...
    

    And other datasets like those listed by HDFView.