Search code examples
pythonhdf

Is an hdf5 file different from an hdf file?


I have several hdf files, but I cannot figure out how to open them in python. When I tried reading it through h5py.File(filename, 'r') command, it resulted in an OSError.

OSError: Unable to open file (file signature not found)

Solution

  • There are two types of HDF files. The specification currently maintained by the HDF Group is HDF5. The older version is HDF4 and it is no longer maintained. The two types are completely different formats.

    To distinguish between the two you could look at the first 4 bytes of the file:

    hf = open(filename, 'rb')
    bts = hf.read(4)
    if bts == b'\x89HDF':
       print('HDF5')
    elif bts == b'\x0e\x03\x13\x01':
       print('HDF4')
    

    The meaning of the HDF4 header signature is this:

    b''.join([(x+64).to_bytes(1, 'big') for x in list(b'\x0e\x03\x13\x01')])
    

    returns b'NCSA' that stays for the National Center for Supercomputing Applications, the inventors of the HDF format.

    b'\x89HDF' are the first 4 bytes of the 8-byte HDF5 signature b'\x89HDF\x0d\x0a\x1a\x0a'. The letters HDF identify the file format, and the other non-printable characters are there to ensure an HDF5 file can be easily identified.

    For reading HDF5 files you can use the h5py module supported by the HDF Group. For HDF4 files there are many Python bindings to HDF low level interface.

    NASA PyHDF has not been updated for a while. It supports not just generic HDF4 files but also NASA own HDF-EOS extensions.

    Unidata NetCDF4 library maintains NetCDF3 compatibility and NetCDF3 is compatible with HDF4.

    from netCDF4 import Dataset
    ds = Dataset(filename, 'r')