Search code examples
rowscipysparse-matrix

scipy.sparse : Set row to zeros


Suppose I have a matrix in the CSR format, what is the most efficient way to set a row (or rows) to zeros?

The following code runs quite slowly:

A = A.tolil()
A[indices, :] = 0
A = A.tocsr()

I had to convert to scipy.sparse.lil_matrix because the CSR format seems to support neither fancy indexing nor setting values to slices.


Solution

  • I guess scipy just does not implement it, but the CSR format would support this quite well, please read the wikipedia article on "Sparse matrix" about what indptr, etc. are:

    # A.indptr is an array, one for each row (+1 for the nnz):
    
    def csr_row_set_nz_to_val(csr, row, value=0):
        """Set all nonzero elements (elements currently in the sparsity pattern)
        to the given value. Useful to set to 0 mostly.
        """
        if not isinstance(csr, scipy.sparse.csr_matrix):
            raise ValueError('Matrix given must be of CSR format.')
        csr.data[csr.indptr[row]:csr.indptr[row+1]] = value
    
    # Now you can just do:
    for row in indices:
        csr_row_set_nz_to_val(A, row, 0)
    
    # And to remove zeros from the sparsity pattern:
    A.eliminate_zeros()
    

    Of course this removes 0s that were set from another place with eliminate_zeros from the sparsity pattern. If you want to do that (at this point) depends on what you are doing really, ie. elimination might make sense to delay until all other calculations that might add new zero's are done as well, or in some cases you may have 0 values, that you want to change again later, so it would be very bad to eliminate them!

    You could in principle of course short-circuit the eliminate_zeros and prune, but that should be a lot of hassle, and might be even slower (because you won't do it in C).


    Details about eliminiate_zeros (and prune)

    The sparse matrix, does generally not save zero elements, but just stores where the nonzero elements are (roughly and with various methods). eliminate_zeros removes all zeros in your matrix from the sparsity pattern (ie. there is no value stored for that position, when before there was a vlaue stored, but it was 0). Eliminate is bad if you want to change a 0 to a different value lateron, otherwise, it saves space.

    Prune would just shrink the data arrays stored when they are longer then necessary. Note that while I first had A.prune() in there, A.eliminiate_zeros() already includes prune.