If you have a sparse matrix X:
>> print type(X)
<class 'scipy.sparse.csr.csr_matrix'>
...How can you sum the squares of each element in each row, and save them into a list? For example:
>>print X.todense()
[[0 2 0 2]
[0 2 0 1]]
How can you turn that into a list of sum of squares of each row:
[[0²+2²+0²+2²]
[0²+2²+0²+1²]]
or:
[8, 5]
First of all, the csr matrix has a .sum
method (relying on the dot product) which works well, so what you need is the squaring. The simplest solution is to create a copy of the sparse matrix, square its data and then sum it:
squared_X = X.copy()
# now square the data in squared_X
squared_X.data **= 2
# and sum each row:
squared_sum = squared_X.sum(1)
# and delete the squared_X:
del squared_X
If you really must save the space, I guess you could just replace .data
and then replace it back, something along:
X.sum_duplicate() # make sure, not sure if this happens with normal usage.
old_data = X.data.copy()
X.data **= 2
squared_sum = X.sum(1)
X.data = old_data
EDIT: There is actually another nice way, as the csr matrix has a .multiply
method for elementwise multiplication:
squared_sum = X.multiply(X).sum(1)
Addition:
Elementwise operations are thus easily done by accessing csr.data
which stores the values for all nonzero elements. NOTE: I guess .sum_duplicates()
may be necessary, I am not sure what kind of operations would make it necessary.