Search code examples
pythonmatlabscikit-learndata-import

Import matlab cell array into python for scikit-learn


I have a 1x81 cell array in matlab.

Each cell is a 30x30 matrix of doubles.

I want to store this in python (for use in scikit-learn) with the shape (81,30,30).

I've read a few questions here and worked through their code but I'm not having any success.


Solution

  • You can do this just using scipy.io.loadmat. But you have to be careful because of some of the differences in the formats.

    from scipy import io
    import numpy as np
    
    C = io.loadmat('test.mat')
    print type(C)
    print C.keys()
    

    Outputs:

    <type 'dict'>
    ['C', '__version__', '__header__', '__globals__']
    

    So you can see that scipy is inlcuding a bunch more information that we don't really need, but we can see your cell C.

    C = C['C']
    print type(C)
    

    Ouputs:

    <type 'numpy.ndarray'>
    

    Okay so that's got use the Cell from Matlab.

    print C.shape
    

    Ouputs:

    (1, 81)
    

    Which isn't quite right, but with a bit of processing we can get it the way you want.

    C = np.squeeze(C)
    X = np.empty((C.shape[0], C[0].shape[0], C[0].shape[1]))
    for i in xrange(X.shape[0]):
        X[i] = C[i]
    print X.shape
    

    Outputs:

    (81, 30, 30)
    

    Voila, we have your cell in a numpy array. Just as a forward warning, in general scikit-learn takes a 2D array as an input, not a 3D array.