Search code examples
pythonhdf5h5pyhdf5storage

Python hdf5storage is transposing my data?


Python Code:

import h5py
import hdf5storage
from functools import reduce
import numpy as np
from operator import mul

sz = 128,256,512
a = np.random.normal(size=reduce(mul,sz)).reshape(sz)
save_dict = {'data':a}

spath = r"test.mat"
hdf5storage.savemat(spath, mdict=save_dict, append_mat=False, 
                    store_python_metadata=True, format='7.3')

with h5py.File(spath, 'r') as file:
    b = np.array(file['data'])

# Reads in the correct shape, but is F-contiguous. Scipy doesn't work with v7.3 files.
c = hdf5storage.loadmat(spath)['data']

When a is created, it has a shape (128,256,512). However, when I save a to the .mat file using hdf5storage, and then load it into b using h5py, b is transposed as has a shape of (512,256,128). Both arrays are C-contiguous when checking their flags.

Is there any way to prevent this transpose from happening? I was under the impression that hdf5 format saves row-major.


Solution

  • I looked again at the abc.h5 file described in:

    how to import .mat-v7.3 file using h5py

    It was created in Octave with:

    >> A = [1,2,3;4,5,6];
    >> B = [1,2,3,4];
    >> save -hdf5 abc.h5 A B
    

    Using h5py:

    In [102]: f = h5py.File('abc.h5','r')
    In [103]: A = f['A']['value'][:]
    In [104]: A
    Out[104]: 
    array([[1., 4.],
           [2., 5.],
           [3., 6.]])
    In [105]: A.shape
    Out[105]: (3, 2)
    In [106]: A.flags
    Out[106]: 
      C_CONTIGUOUS : True
      F_CONTIGUOUS : False
      ...
    In [107]: A.ravel()
    Out[107]: array([1., 4., 2., 5., 3., 6.])
    

    So it's a transposed C order array. Apparently that's how MATLAB developers have chosen to store their matrices in HDF5.

    I could tranpose it in numpy:

    In [108]: At = A.T
    In [109]: At
    Out[109]: 
    array([[1., 2., 3.],
           [4., 5., 6.]])
    In [110]: At.flags
    Out[110]: 
      C_CONTIGUOUS : False
      F_CONTIGUOUS : True
      ....
    

    As is normal, a C-order array becomes F-order when transposed.

    The Octave matrices saved with the older .mat format

    In [115]: data = io.loadmat('../abc.mat')
    In [116]: data['A']
    Out[116]: 
    array([[1., 2., 3.],
           [4., 5., 6.]])
    In [117]: _.flags
    Out[117]: 
      C_CONTIGUOUS : False
      F_CONTIGUOUS : True
    

    So the h5py array, transposed, matches the convention that io.loadmat has been using for quite some time.

    I don't have hdf5storage installed on this OS. But by your tests, it is following the io.loadmat convention - correct shape but F order.