Search code examples
pythonscipysparse-matrix

Bulk zeroing of elements in a scipy.sparse_matrix


I've got a scipy.sparse_matrix A and I want to zero-out a decently-sized fraction of the elements. (In the matrices I'm working with today, A has about 70M entries and I want to zero-out about 700K of them). I have those elements available in a couple different formats, but for now they're in a sparse_matrix B of the same dimension as A with 0/1 values.

If these were dense matrices (EDIT: numpy arrays), I could do A = A-A*B but I haven't been able to come up with any easy way to do these with sparse matrices. (or really any way at all beyond (a) iterating through the elements in B and setting A to 0 at those elements or (b) converting everything to dense, which for the sizes I have will just barely fit in memory...)


Solution

  • Scipy's sparse matrices have a multiply method that does pointwise multiplication. You can simply do:

    A = A - A.multiply(B)
    

    I thought you may have to run the eliminate_zeros() method to get rid of the zeroed entries, but apparently that is not necessary:

    >>> sp_mat
    <1000000x1000000 sparse matrix of type '<type 'numpy.float64'>'
        with 1000 stored elements in Compressed Sparse Row format>
    >>> zero_mat
    <1000000x1000000 sparse matrix of type '<type 'numpy.int32'>'
        with 96 stored elements in Compressed Sparse Row format>
    >>> sp_mat - sp_mat.multiply(zero_mat)
    <1000000x1000000 sparse matrix of type '<type 'numpy.float64'>'
        with 904 stored elements in Compressed Sparse Row format>