Search code examples
pythonnumpyscipysparse-array

Norm of sparse python vectors


Is it possible to effectively obtain the norm of a sparse vector in python?

I tried the following:

from scipy import sparse
from numpy.linalg import norm

vector1 = sparse.csr_matrix([ 0 for i in xrange(4000000) ], dtype = float64)

#just to test I set a few points to a value higher than 0

vector1[ (0, 10) ] = 5
vector1[ (0, 1500) ] = 80
vector1[ (0, 2000000) ] = 6

n = norm(t1)

but then I get the error:

ValueError: dimension mismatch

The norm function only works with arrays so probably that's why the csr_matrix is not working, but then I didn't find another way of computing the norm effectively. One possible solution would be to compute:

norm(asarray(vector1.todense()))

but then it kills the purpose of using sparse vectors at first. And as the last approach I could iterate through each element of the vector and compute manually the norm, but since efficiency is really important I was searching for something faster and easier to implement.

Thanks in advance for any help!

EDIT: I tried everything that was suggested and the best solution is:

(vector1.data ** 2).sum()

from Dougal. But the Cython solution is also very good and works better as the vector grows in number of elements different of zero. Thanks everyone for the help!


Solution

    1. I hope you are not really initializing and setting elements like that, those warnings are raised for a reason, and a 4M temporary list proofs you have plenty of resources left ;).
    2. Calculating a norm by hand is very simple, by just using the underlying data vector1.data directly. You can also use things like vector1.multiply(vector1) plus .sum or vector1.dot(vector1.T) but as Dougal pointed out, that can be much slower for this simple case.
    3. I guess you want to do more, but if you only want vector norms, going through sparse matrices seems like a lot of unnecessary work.