I am trying to read v7.3 matlab .mat in python files using h5py
.
I am encountering a problem where the representations of character arrays (e.g., typically, .mat fields containing a single string), and uint16 arrays, appear identical.
>> ushortarr = uint16([63 109 105 102])
>> strarr = 'gibl'
>> save('short_string_difference.mat', 'ushortarr', 'strarr', '-v7.3')
When loaded back into matlab, matlab is able to detect the correct data types of these variables:
>> ss73 = load('short_string_difference.mat')
ss73 =
strarr: 'gibl'
ushortarr: [69 109 105 102]
But h5py suggests that the structure of this file is as follows:
(Pdb) strarr
<HDF5 dataset "strarr": shape (4, 1), type "<u2">
(Pdb) ushortarr
<HDF5 dataset "ushortarr": shape (4, 1), type "<u2">
(Pdb) strarr.value
array([[103],
[105],
[ 98],
[108]], dtype=uint16)
(Pdb) ushortarr.value
array([[ 69],
[109],
[105],
[102]], dtype=uint16)
(I also checked and determined that octave has a similar behavior to h5py for v7.3 matlab files, but that both scipy.io.loadmat
and octave have correct behavior for older, >=v7 .mat files. Looking through bug reports suggests that they don't a fix for this or a bunch of other problems with v7.3 mat files, and they don't officially support v7.3 at all)
My question is this: what data that h5py ignores, or other trick, is matlab using to determine the types of these variables when it loads them from this file? A secondary question is, is there a python implementation of a reader that can make this check whatever is used to make this determination?
You have to take a look at the attributes, which can be accessed via:
strarr.attrs
There you will find an attribute named MATLAB_class
which is char
or uint16