Search code examples
matlabnumpyhdf5h5py

Read a matlab .mat file using h5py


I want to use Python3 package h5py to read a matlab .mat file of version 7.3.

It contains a variable in matlab, named results.

It contains a 1*1 cell, and the value in the struct inside is what I need.

In matlab, I can get these data through the following code:

load('.mat PATH');
results{1}.res

How should I read this data in h5py? Example .mat file can be obtained from here


Solution

  • While h5py can read h5 files from MATLAB, figuring out what is there takes some exploring - looking at keys groups and datasets (and possibly attr). There's nothing in scipy that will help you (scipy.io.loadmat is for the old MATLAB mat format).

    With the downloaded file:

    In [61]: f = h5py.File('Downloads/Basketball_ECO_HC.mat','r')
    In [62]: f
    Out[62]: <HDF5 file "Basketball_ECO_HC.mat" (mode r)>
    In [63]: f.keys()
    Out[63]: <KeysViewHDF5 ['#refs#', 'results']>
    In [65]: f['results']
    Out[65]: <HDF5 dataset "results": shape (1, 1), type "|O">
    In [66]: arr = f['results'][:]
    In [67]: arr
    Out[67]: array([[<HDF5 object reference>]], dtype=object)
    In [68]: arr.item()
    Out[68]: <HDF5 object reference>
    

    I'd have to check the h5py docs to see if I can check that object reference further. I'm not familiar with it.

    But exploring the other key:

    In [69]: list(f.keys())[0]
    Out[69]: '#refs#'
    In [70]: f[list(f.keys())[0]]
    Out[70]: <HDF5 group "/#refs#" (2 members)>
    In [71]: f[list(f.keys())[0]].keys()
    Out[71]: <KeysViewHDF5 ['a', 'b']>
    In [72]: f[list(f.keys())[0]]['a']
    Out[72]: <HDF5 dataset "a": shape (2,), type "<u8">
    In [73]: _[:]
    Out[73]: array([0, 0], dtype=uint64)
    In [74]: f[list(f.keys())[0]]['b']
    Out[74]: <HDF5 group "/#refs#/b" (7 members)>
    In [75]: f[list(f.keys())[0]]['b'].keys()
    Out[75]: <KeysViewHDF5 ['annoBegin', 'fps', 'fps_no_ftr', 'len', 'res', 'startFrame', 'type']>
    In [76]: f[list(f.keys())[0]]['b']['fps']
    Out[76]: <HDF5 dataset "fps": shape (1, 1), type "<f8">
    In [77]: f[list(f.keys())[0]]['b']['fps'][:]
    Out[77]: array([[22.36617883]])
    

    In the OS shell , I can look at the file with h5dump. From that it looks like the res dataset has the most data. The datasets also have attributes. That may be a better way of getting an overview, and use that to guide the h5py loads.

    In [80]: f[list(f.keys())[0]]['b']['res'][:]
    Out[80]: 
    array([[198., 196., 195., ..., 330., 328., 326.],
           [214., 214., 216., ..., 197., 196., 192.],
           [ 34.,  34.,  34., ...,  34.,  34.,  34.],
           [ 81.,  81.,  81., ...,  81.,  80.,  80.]])
    In [81]: f[list(f.keys())[0]]['b']['res'][:].shape
    Out[81]: (4, 725)
    In [82]: f[list(f.keys())[0]]['b']['res'][:].dtype
    Out[82]: dtype('<f8')