Search code examples
pythonmatrixscipysparse-matrix

Python Scipy How to traverse upper/lower trianglar portion non-zeros from csr_matrix


I have a very sparse matrix(similarity matrix) with dimensions 300k * 300k. In order to find out the relatively greater similarities between users, I only need upper/lower triangular portion of the matrix. So, how to get the coordinates of users with value larger than a threshold in an efficient way? Thanks.


Solution

  • How about

    sparse.triu(M)
    

    If M is

    In [819]: M.A
    Out[819]: 
    array([[0, 1, 2],
           [3, 4, 5],
           [6, 7, 8]], dtype=int32)
    
    In [820]: sparse.triu(M).A
    Out[820]: 
    array([[0, 1, 2],
           [0, 4, 5],
           [0, 0, 8]], dtype=int32)
    

    You may need to construct a new sparse matrix, with just nonzeros above the threshold.

    In [826]: sparse.triu(M>2).A
    Out[826]: 
    array([[False, False, False],
           [False,  True,  True],
           [False, False,  True]], dtype=bool)
    
    In [827]: sparse.triu(M>2).nonzero()
    Out[827]: (array([1, 1, 2], dtype=int32), array([1, 2, 2], dtype=int32))
    

    Here's the code for triu:

    def triu(A, k=0, format=None):
        A = coo_matrix(A, copy=False)
        mask = A.row + k <= A.col
        row = A.row[mask]
        col = A.col[mask]
        data = A.data[mask]
        return coo_matrix((data,(row,col)), shape=A.shape).asformat(format)