Search code examples
numpyscipysparse-matrix

How to slice a scipy sparse matrix and keep the original indexing?


Let's say i have the following array:

   import numpy as np
   a = np.array([[1, 2, 3], [0, 1, 2], [1, 3, 4], [4, 5, 6]])
   a = sp_sparse.csr_matrix(a)

and I want to get a submatrix of the sparse array that consists of the first and last rows.

>>>sub_matrix = a[[0, 3], :]
>>>print(sub_matrix)
(0, 0)  1
(0, 1)  2
(0, 2)  3
(1, 0)  4
(1, 1)  5
(1, 2)  6

But I want to keep the original indexing for the selected rows, so for my example, it would be something like:

  (0, 0)    1
  (0, 1)    2
  (0, 2)    3
  (3, 0)    4
  (3, 1)    5
  (3, 2)    6 

I know I could do this by setting all the other rows of the dense array to zero and then computing the sparse array again but I want to know if there is a better way to achieve this.

Any help would be appreciated!


Solution

  • Depending on the indexing, it might be easier to construct the extractor/indexing matrix with the coo style of inputs:

    In [129]: from scipy import sparse
    In [130]: M = sparse.csr_matrix(np.arange(16).reshape(4,4))
    In [131]: M
    Out[131]: 
    <4x4 sparse matrix of type '<class 'numpy.int64'>'
        with 15 stored elements in Compressed Sparse Row format>
    In [132]: M.A
    Out[132]: 
    array([[ 0,  1,  2,  3],
           [ 4,  5,  6,  7],
           [ 8,  9, 10, 11],
           [12, 13, 14, 15]])
    

    A square extractor matrix with the desired "diagonal" values:

    In [133]: extractor = sparse.csr_matrix(([1,1],([0,3],[0,3])))
    In [134]: extractor
    Out[134]: 
    <4x4 sparse matrix of type '<class 'numpy.int64'>'
        with 2 stored elements in Compressed Sparse Row format>
    

    Matrix multiplication in one direction selects columns:

    In [135]: M@extractor
    Out[135]: 
    <4x4 sparse matrix of type '<class 'numpy.int64'>'
        with 7 stored elements in Compressed Sparse Row format>
    In [136]: _.A
    Out[136]: 
    array([[ 0,  0,  0,  3],
           [ 4,  0,  0,  7],
           [ 8,  0,  0, 11],
           [12,  0,  0, 15]])
    

    and in the other, rows:

    In [137]: extractor@M
    Out[137]: 
    <4x4 sparse matrix of type '<class 'numpy.int64'>'
        with 7 stored elements in Compressed Sparse Row format>
    In [138]: _.A
    Out[138]: 
    array([[ 0,  1,  2,  3],
           [ 0,  0,  0,  0],
           [ 0,  0,  0,  0],
           [12, 13, 14, 15]])
    In [139]: extractor.A
    Out[139]: 
    array([[1, 0, 0, 0],
           [0, 0, 0, 0],
           [0, 0, 0, 0],
           [0, 0, 0, 1]])
    

    M[[0,3],:] does the same thing, but with:

    In [140]: extractor = sparse.csr_matrix(([1,1],([0,1],[0,3])))
    In [142]: (extractor@M).A
    Out[142]: 
    array([[ 0,  1,  2,  3],
           [12, 13, 14, 15]])
    

    Row and column sums are also performed with matrix multiplication:

    In [149]: M@np.ones(4,int)
    Out[149]: array([ 6, 22, 38, 54])