Search code examples
pythonsparse-matrixzerorowwise

How to randomly select one nonzero element per row from a sparse matrix with out for loop in python


I have a large sparse matrix whose each row contains multiple nonzero elements, for example

a = np.array([[1, 1,0,0,0,0], [2,0, 1,0,2,0], [3,0,4,0,0, 3]])

I want to be able to randomly select one nonzero element per row without for loop. Any good suggestion? As output, I am more interested in chosen elements' index than its value.


Solution

  • With a numpy array such as:

    arr = np.array([5, 2, 6, 0, 2, 0, 0, 6])
    

    you can do arr != 0 which will give a True / False array of values which pass the condition so in our case, where the values are not equal (!=) to 0. So:

    array([ True,  True,  True, False,  True, False, False,  True], dtype=bool)
    

    from here, we can 'index' arr with this boolean array by doing arr[arr != 0] which gives us:

    array([5, 2, 6, 2, 6])
    

    So now that we have a way of removing the non-zero values from a numpy array, we can do a simple list comprehension on each row in your a array. For each row, we remove the zeros and then perform a random.choice on the array. As so:

    np.array([np.random.choice(r[r!=0]) for r in a])
    

    which gives you back an array of length 3 containing random non-zero items from each row in a. :)

    Hope this helps!

    Update

    If you want the indexes of the random non-zero numbers in the array, you can use .nonzero().

    So if we have this array:

    arr = np.array([5, 2, 6, 0, 2, 0, 0, 6])
    

    we can do:

    arr.nonzero()
    

    which gives a tuple of the indexes of non-zero elements:

    (array([0, 1, 2, 4, 7]),)
    

    so as with before, we can use this and np.random.choice() in a list-comprehension to produce random indexes:

    a = np.array([[1, 1, 0, 0, 0, 0], [2, 0, 1, 0, 2, 0], [3, 0, 4, 0, 0, 3]])
    
    np.array([np.random.choice(r.nonzero()[0]) for r in a])
    

    which returns an array of the form [x, y, z] where x, y and z are random indexes of non-zero elements from their corresponding rows.

    E.g. one result could be:

    array([1, 4, 2])
    

    And if you want it to also return the rows, you could just add in a numpy.arrange() call on the length of a to get an array of row numbers:

    ([np.arange(len(a))], np.array([np.random.choice(r.nonzero()[0]) for r in a]))
    

    so an example random output could be:

    ([array([0, 1, 2])], array([1, 2, 5]))
    

    for a as:

    array([[1, 1, 0, 0, 0, 0],
           [2, 0, 1, 0, 2, 0],
           [3, 0, 4, 0, 0, 3]])
    

    Hope this does what you want now :)