I have an scipy CSR matrix and i want to get element column indices for each row. My approach is:
import scipy.sparse as sp
N = 100
d = 0.1
M = sp.rand(N, N, d, format='csr')
indM = [row.nonzero()[1] for row in M]
indM is what i need, it has the same number of row as M and looks like this:
[array([ 6, 7, 11, ..., 79, 85, 86]),
array([12, 20, 25, ..., 84, 93, 95]),
...
array([ 7, 24, 32, 40, 50, 51, 57, 71, 74, 96]),
array([ 1, 4, 9, ..., 71, 95, 96])]
The problem is that with big matrices this approach looks slow. Is there any way to avoid list comprehension or somehow speed this up?
Thank you.
You can simply use the indices
and indptr
attributes directly:
import numpy
import scipy.sparse
N = 5
d = 0.3
M = scipy.sparse.rand(N, N, d, format='csr')
M.toarray()
# array([[ 0. , 0. , 0. , 0. , 0. ],
# [ 0. , 0. , 0. , 0. , 0.30404632],
# [ 0.63503713, 0. , 0. , 0. , 0. ],
# [ 0.68865311, 0.81492098, 0. , 0. , 0. ],
# [ 0.08984168, 0.87730292, 0. , 0. , 0.18609702]])
M.indices
# array([1, 2, 4, 3, 0, 1, 4], dtype=int32)
M.indptr
# array([0, 3, 4, 6, 6, 7], dtype=int32)
numpy.split(M.indices, M.indptr)[1:-1]
# [array([], dtype=int32),
# array([4], dtype=int32),
# array([0], dtype=int32),
# array([0, 1], dtype=int32),
# array([0, 1, 4], dtype=int32)]