I've got a scipy.sparse_matrix A and I want to zero-out a decently-sized fraction of the elements. (In the matrices I'm working with today, A has about 70M entries and I want to zero-out about 700K of them). I have those elements available in a couple different formats, but for now they're in a sparse_matrix B of the same dimension as A with 0/1 values.
If these were dense matrices (EDIT: numpy arrays), I could do A = A-A*B but I haven't been able to come up with any easy way to do these with sparse matrices. (or really any way at all beyond (a) iterating through the elements in B and setting A to 0 at those elements or (b) converting everything to dense, which for the sizes I have will just barely fit in memory...)
Scipy's sparse matrices have a multiply
method that does pointwise multiplication. You can simply do:
A = A - A.multiply(B)
I thought you may have to run the eliminate_zeros()
method to get rid of the zeroed entries, but apparently that is not necessary:
>>> sp_mat
<1000000x1000000 sparse matrix of type '<type 'numpy.float64'>'
with 1000 stored elements in Compressed Sparse Row format>
>>> zero_mat
<1000000x1000000 sparse matrix of type '<type 'numpy.int32'>'
with 96 stored elements in Compressed Sparse Row format>
>>> sp_mat - sp_mat.multiply(zero_mat)
<1000000x1000000 sparse matrix of type '<type 'numpy.float64'>'
with 904 stored elements in Compressed Sparse Row format>