Search code examples
numpymatrixvectorizationsparse-matrix

Optimize per-element operation on sparse numpy 2D array


I require M = np.log(1+M) on a large 99.9% sparse 2D matrix.

How to perform this efficiently?

x, y = M.nonzero() will retrieve coord-pairs of nonzero elements, but can I vectorize a log operation over these pairs?

numpy doesn't seem to have sparse support.


Solution

  • This is simplest:

    import numpy as np
    import scipy.sparse as sps
    
    M = sps.csr_matrix(M)
    
    M.data += 1
    M.data = np.log(M.data)
    

    If it's particularly large you could also log it in place (this prevents the full copy above):

    M.data += 1
    M.data=np.log(M.data,out=M.data)
    

    Both of these options work on dense matrices as well with minor changes - if your matrix is 99.9% sparse I would start using actual sparse data structures though.

    You could also use the where argument on a dense array, but I doubt it would actually be any faster:

    M = np.add(M, 1, out=M, where=M!=0)
    M = np.log(M, out=M, where=M!=0)