Search code examples
pythonnumpymatrixsparse-matrix

binarize a sparse matrix in python in a different way


Assume I have a matrix like:

4 0 3 5
0 2 6 0
7 0 1 0

I want it binarized as:

0 0 0 0
0 1 0 0
0 0 1 0

That is set threshold equal to 2, any element greater than the threshold is set to 0, any element less or equal than the threshold(except 0) is set to 1.

Can we do this on python's csr_matrix or any other sparse matrix?

I know scikit-learn offer Binarizer to replace values below or equal to the threshold by 0, above it by 1.


Solution

  • When dealing with a sparse matrix, s, avoid inequalities that include zero since a sparse matrix (if you're using it appropriately) should have a great many zeros and forming an array of all the locations which are zero would be huge. So avoid s <= 2 for example. Use inequalities that select away from zero instead.

    import numpy as np
    from scipy import sparse
    
    s = sparse.csr_matrix(np.array([[4, 0, 3, 5],
             [0, 2, 6, 0],
             [7, 0, 1, 0]]))
    
    print(s)
    # <3x4 sparse matrix of type '<type 'numpy.int64'>'
    #   with 7 stored elements in Compressed Sparse Row format>
    
    s[s > 2] = 0
    s[s != 0] = 1
    
    print(s.todense())
    

    yields

    matrix([[0, 0, 0, 0],
            [0, 1, 0, 0],
            [0, 0, 1, 0]])