Is it possible to effectively obtain the norm of a sparse vector in python?
I tried the following:
from scipy import sparse
from numpy.linalg import norm
vector1 = sparse.csr_matrix([ 0 for i in xrange(4000000) ], dtype = float64)
#just to test I set a few points to a value higher than 0
vector1[ (0, 10) ] = 5
vector1[ (0, 1500) ] = 80
vector1[ (0, 2000000) ] = 6
n = norm(t1)
but then I get the error:
ValueError: dimension mismatch
The norm function only works with arrays so probably that's why the csr_matrix is not working, but then I didn't find another way of computing the norm effectively. One possible solution would be to compute:
norm(asarray(vector1.todense()))
but then it kills the purpose of using sparse vectors at first. And as the last approach I could iterate through each element of the vector and compute manually the norm, but since efficiency is really important I was searching for something faster and easier to implement.
Thanks in advance for any help!
EDIT: I tried everything that was suggested and the best solution is:
(vector1.data ** 2).sum()
from Dougal. But the Cython solution is also very good and works better as the vector grows in number of elements different of zero. Thanks everyone for the help!
vector1.data
directly. You can also use things like vector1.multiply(vector1)
plus .sum
or vector1.dot(vector1.T)
but as Dougal pointed out, that can be much slower for this simple case.