Let's say i have the following array:
import numpy as np
a = np.array([[1, 2, 3], [0, 1, 2], [1, 3, 4], [4, 5, 6]])
a = sp_sparse.csr_matrix(a)
and I want to get a submatrix of the sparse array that consists of the first and last rows.
>>>sub_matrix = a[[0, 3], :]
>>>print(sub_matrix)
(0, 0) 1
(0, 1) 2
(0, 2) 3
(1, 0) 4
(1, 1) 5
(1, 2) 6
But I want to keep the original indexing for the selected rows, so for my example, it would be something like:
(0, 0) 1
(0, 1) 2
(0, 2) 3
(3, 0) 4
(3, 1) 5
(3, 2) 6
I know I could do this by setting all the other rows of the dense array to zero and then computing the sparse array again but I want to know if there is a better way to achieve this.
Any help would be appreciated!
Depending on the indexing, it might be easier to construct the extractor/indexing matrix with the coo
style of inputs:
In [129]: from scipy import sparse
In [130]: M = sparse.csr_matrix(np.arange(16).reshape(4,4))
In [131]: M
Out[131]:
<4x4 sparse matrix of type '<class 'numpy.int64'>'
with 15 stored elements in Compressed Sparse Row format>
In [132]: M.A
Out[132]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
A square extractor matrix with the desired "diagonal" values:
In [133]: extractor = sparse.csr_matrix(([1,1],([0,3],[0,3])))
In [134]: extractor
Out[134]:
<4x4 sparse matrix of type '<class 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Row format>
Matrix multiplication in one direction selects columns:
In [135]: M@extractor
Out[135]:
<4x4 sparse matrix of type '<class 'numpy.int64'>'
with 7 stored elements in Compressed Sparse Row format>
In [136]: _.A
Out[136]:
array([[ 0, 0, 0, 3],
[ 4, 0, 0, 7],
[ 8, 0, 0, 11],
[12, 0, 0, 15]])
and in the other, rows:
In [137]: extractor@M
Out[137]:
<4x4 sparse matrix of type '<class 'numpy.int64'>'
with 7 stored elements in Compressed Sparse Row format>
In [138]: _.A
Out[138]:
array([[ 0, 1, 2, 3],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[12, 13, 14, 15]])
In [139]: extractor.A
Out[139]:
array([[1, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 1]])
M[[0,3],:]
does the same thing, but with:
In [140]: extractor = sparse.csr_matrix(([1,1],([0,1],[0,3])))
In [142]: (extractor@M).A
Out[142]:
array([[ 0, 1, 2, 3],
[12, 13, 14, 15]])
Row and column sums are also performed with matrix multiplication:
In [149]: M@np.ones(4,int)
Out[149]: array([ 6, 22, 38, 54])