Search code examples
pythonnumpyscipyslicesparse-matrix

Scipy sparse matrix slicing returns IndexError


If I try to slice a sparse matrix or see the value at a given [row,colum], I get a IndexError

More precisely, I have the following scipy.sparse.csr_matrix which I load from a file after saving it

...
>>> A = scipy.sparse.csr_matrix((vals, (rows, cols)), shape=(output_dim, input_dim))
>>> np.save(open('test_matrix.dat', 'wb'), A)
...
>>> A = np.load('test_matrix.dat', allow_pickle=True)
>>> A
array(<831232x798208 sparse matrix of type '<class 'numpy.float32'>'
    with 109886100 stored elements in Compressed Sparse Row format>,
      dtype=object)

However, when I try to get the value at a given [row,column] pair, I get the following error

>>> A[1,1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: too many indices for array

Why is that happening?

Just to clarify, I'm sure that the matrix is not empty, as I can see its content if I do

>>> print(A)
  (0, 1)    0.24914551
  (0, 2)    0.6669922
  (1, 1)    0.75097656
  (1, 3)    0.6640625
  (2, 3)    0.3359375
  (2, 514)  0.34960938
...

Solution

  • When you save and reload your sparse array you have created an array with one entry; an object, being your sparse array. So A has nothing at [1,1]. You should use scipy.sparse.save_npz instead.

    For example:

    import scipy.sparse as sps
    import numpy as np
    
    A = sps.csr_matrix((10,10))
    A
    <10x10 sparse matrix of type '<class 'numpy.float64'>'
        with 0 stored elements in Compressed Sparse Row format>
    np.save('test_matrix.dat', A)
    B = np.load('test_matrix.dat.npy', allow_pickle=True)
    B
    array(<10x10 sparse matrix of type '<class 'numpy.float64'>'
        with 0 stored elements in Compressed Sparse Row format>, dtype=object)
    B[1,1]
    IndexError                                Traceback (most recent call last)
    <ipython-input-101-969f8bd5206a> in <module>
    ----> 1 B[1,1]
    
    IndexError: too many indices for array
    sps.save_npz('sparse_dat')
    C = sps.load_npz('sparse_dat.npz')
    C
    <10x10 sparse matrix of type '<class 'numpy.float64'>'
        with 0 stored elements in Compressed Sparse Row format>
    C[1,1]
    0.0
    

    Mind you you can still retrieve A from B like so:

    D = B.tolist()
    D
    <10x10 sparse matrix of type '<class 'numpy.float64'>'
        with 0 stored elements in Compressed Sparse Row format>
    D[1,1]
    0.0