Search code examples
pythonnumpyscipymat-file

Index elements in specific dimension numpy


I know the title is very general but I don't know of a better way to describe my question.

I'm using scipy's io.loadmat to load a Matlab mat file. This mat file originally had some structs in it which I suppose were converted to numpy arrays. The structure of the mat file is as follows. There are 500 structs each with 3 fields.

print(data[0].shape)
(500, )

The first and second fields have elements of shape (300, 300)

print(data[0][0].shape)
(300, 300)
print(data[499][0].shape)
(300, 300)
print(data[0][1].shape)
(300, 300)
print(data[499][1].shape)
(300, 300)

The third field is a scalar

print(data[0][2].shape)
(1, 1)
print(data[499][2].shape)
(1, 1)

I want to split up this file so I have a variables of size (500, 300, 300), (500, 300, 300) and (500, )

I've tried

field1 = data[:][0]

but it gives the wrong elements. field1[0] = data[0][0], field1[1] = data[0][1], field1[2] = data[0][2] and field1[3] gives an invalid index error. I want field1[0] = data[0][0] ... field1[499] = data[499][0]

How do I index across the dimension of size 500?

I know I can do

field1 = np.array([data[i][0] for i in range(500)])

but I'm wondering if there's something simpler


Solution

  • Sounds like you have a structured array with 3 fields. Something along this line line:

    two fields:

    In [38]: dt = np.dtype([('f0',int,(2,2)),('f1','U3',(1,1))])                                           
    

    for records/items:

    In [39]: data = np.zeros((4,), dtype=dt)                                                               
    In [40]: data                                                                                          
    Out[40]: 
    array([([[0, 0], [0, 0]], [['']]), ([[0, 0], [0, 0]], [['']]),
           ([[0, 0], [0, 0]], [['']]), ([[0, 0], [0, 0]], [['']])],
          dtype=[('f0', '<i8', (2, 2)), ('f1', '<U3', (1, 1))])
    In [41]: data.shape                                                                                    
    Out[41]: (4,)
    

    one record:

    In [42]: data[0]                                                                                       
    Out[42]: ([[0, 0], [0, 0]], [['']])
    

    the field may be selected by number - because it is a tuple (or tuple-like):

    In [43]: data[0][0]                                                                                    
    Out[43]: 
    array([[0, 0],
           [0, 0]])
    

    but to select by field for all records, use the name:

    In [45]: data['f0']                                                                                    
    Out[45]: 
    array([[[0, 0],
            [0, 0]],
    
           [[0, 0],
            [0, 0]],
    
           [[0, 0],
            [0, 0]],
    
           [[0, 0],
            [0, 0]]])
    In [46]: data['f0'].shape                                                                              
    Out[46]: (4, 2, 2)