Search code examples
pythonscipysparse-matrix

SciPy sparse matrix (COO,CSR): Clear row


For creating a scipy sparse matrix, I have an array or row and column indices I and J along with a data array V. I use those to construct a matrix in COO format and then convert it to CSR,

matrix = sparse.coo_matrix((V, (I, J)), shape=(n, n))
matrix = matrix.tocsr()

I have a set of row indices for which the only entry should be a 1.0 on the diagonal. So far, I go through I, find all indices that need wiping, and do just that:

def find(lst, a):
    # From <http://stackoverflow.com/a/16685428/353337>
    return [i for i, x in enumerate(lst) if x in a]

# wipe_rows = [1, 55, 32, ...]  # something something

indices = find(I, wipe_rows)  # takes too long
I = numpy.delete(I, indices).tolist()
J = numpy.delete(J, indices).tolist()
V = numpy.delete(V, indices).tolist()

# Add entry 1.0 to the diagonal for each wipe row
I.extend(wipe_rows)
J.extend(wipe_rows)
V.extend(numpy.ones(len(wipe_rows)))

# construct matrix via coo

That works alright, but find tends to take a while.

Any hints on how to speed this up? (Perhaps wiping the rows in COO or CSR format is a better idea.)


Solution

  • If you intend to clear multiple rows at once, this

    def _wipe_rows_csr(matrix, rows):
        assert isinstance(matrix, sparse.csr_matrix)
    
        # delete rows
        for i in rows:
            matrix.data[matrix.indptr[i]:matrix.indptr[i+1]] = 0.0
    
        # Set the diagonal
        d = matrix.diagonal()
        d[rows] = 1.0
        matrix.setdiag(d)
    
        return
    

    is by far the fastest method. It doesn't really remove the lines, but sets all entries to zeros, then fiddles with the diagonal.

    If the entries are actually to be removed, one has to do some array manipulation. This can be quite costly, but if speed is no issue: This

    def _wipe_row_csr(A, i):
        '''Wipes a row of a matrix in CSR format and puts 1.0 on the diagonal.
        '''
        assert isinstance(A, sparse.csr_matrix)
    
        n = A.indptr[i+1] - A.indptr[i]
    
        assert n > 0
    
        A.data[A.indptr[i]+1:-n+1] = A.data[A.indptr[i+1]:]
        A.data[A.indptr[i]] = 1.0
        A.data = A.data[:-n+1]
    
        A.indices[A.indptr[i]+1:-n+1] = A.indices[A.indptr[i+1]:]
        A.indices[A.indptr[i]] = i
        A.indices = A.indices[:-n+1]
    
        A.indptr[i+1:] -= n-1
    
        return
    

    replaces a given row i of the matrix matrix by the entry 1.0 on the diagonal.