Read a matlab .mat file using h5py

I want to use Python3 package h5py to read a matlab .mat file of version 7.3.

It contains a variable in matlab, named results.

It contains a 1*1 cell, and the value in the struct inside is what I need.

In matlab, I can get these data through the following code:

load('.mat PATH');
results{1}.res

How should I read this data in h5py? Example .mat file can be obtained from here

Solution

While h5py can read h5 files from MATLAB, figuring out what is there takes some exploring - looking at keys groups and datasets (and possibly attr). There's nothing in scipy that will help you (scipy.io.loadmat is for the old MATLAB mat format).

With the downloaded file:

In [61]: f = h5py.File('Downloads/Basketball_ECO_HC.mat','r')
In [62]: f
Out[62]: <HDF5 file "Basketball_ECO_HC.mat" (mode r)>
In [63]: f.keys()
Out[63]: <KeysViewHDF5 ['#refs#', 'results']>
In [65]: f['results']
Out[65]: <HDF5 dataset "results": shape (1, 1), type "|O">
In [66]: arr = f['results'][:]
In [67]: arr
Out[67]: array([[<HDF5 object reference>]], dtype=object)
In [68]: arr.item()
Out[68]: <HDF5 object reference>

I'd have to check the h5py docs to see if I can check that object reference further. I'm not familiar with it.

But exploring the other key:

In [69]: list(f.keys())[0]
Out[69]: '#refs#'
In [70]: f[list(f.keys())[0]]
Out[70]: <HDF5 group "/#refs#" (2 members)>
In [71]: f[list(f.keys())[0]].keys()
Out[71]: <KeysViewHDF5 ['a', 'b']>
In [72]: f[list(f.keys())[0]]['a']
Out[72]: <HDF5 dataset "a": shape (2,), type "<u8">
In [73]: _[:]
Out[73]: array([0, 0], dtype=uint64)
In [74]: f[list(f.keys())[0]]['b']
Out[74]: <HDF5 group "/#refs#/b" (7 members)>
In [75]: f[list(f.keys())[0]]['b'].keys()
Out[75]: <KeysViewHDF5 ['annoBegin', 'fps', 'fps_no_ftr', 'len', 'res', 'startFrame', 'type']>
In [76]: f[list(f.keys())[0]]['b']['fps']
Out[76]: <HDF5 dataset "fps": shape (1, 1), type "<f8">
In [77]: f[list(f.keys())[0]]['b']['fps'][:]
Out[77]: array([[22.36617883]])

In the OS shell , I can look at the file with h5dump. From that it looks like the res dataset has the most data. The datasets also have attributes. That may be a better way of getting an overview, and use that to guide the h5py loads.

In [80]: f[list(f.keys())[0]]['b']['res'][:]
Out[80]: 
array([[198., 196., 195., ..., 330., 328., 326.],
       [214., 214., 216., ..., 197., 196., 192.],
       [ 34.,  34.,  34., ...,  34.,  34.,  34.],
       [ 81.,  81.,  81., ...,  81.,  80.,  80.]])
In [81]: f[list(f.keys())[0]]['b']['res'][:].shape
Out[81]: (4, 725)
In [82]: f[list(f.keys())[0]]['b']['res'][:].dtype
Out[82]: dtype('<f8')