How to randomly select one nonzero element per row from a sparse matrix with out for loop in python

I have a large sparse matrix whose each row contains multiple nonzero elements, for example

a = np.array([[1, 1,0,0,0,0], [2,0, 1,0,2,0], [3,0,4,0,0, 3]])

I want to be able to randomly select one nonzero element per row without for loop. Any good suggestion? As output, I am more interested in chosen elements' index than its value.

Solution

With a numpy array such as:

arr = np.array([5, 2, 6, 0, 2, 0, 0, 6])

you can do arr != 0 which will give a True / False array of values which pass the condition so in our case, where the values are not equal (!=) to 0. So:

array([ True,  True,  True, False,  True, False, False,  True], dtype=bool)

from here, we can 'index' arr with this boolean array by doing arr[arr != 0] which gives us:

array([5, 2, 6, 2, 6])

So now that we have a way of removing the non-zero values from a numpy array, we can do a simple list comprehension on each row in your a array. For each row, we remove the zeros and then perform a random.choice on the array. As so:

np.array([np.random.choice(r[r!=0]) for r in a])

which gives you back an array of length 3 containing random non-zero items from each row in a. :)

Hope this helps!

Update

If you want the indexes of the random non-zero numbers in the array, you can use .nonzero().

So if we have this array:

arr = np.array([5, 2, 6, 0, 2, 0, 0, 6])

we can do:

arr.nonzero()

which gives a tuple of the indexes of non-zero elements:

(array([0, 1, 2, 4, 7]),)

so as with before, we can use this and np.random.choice() in a list-comprehension to produce random indexes:

a = np.array([[1, 1, 0, 0, 0, 0], [2, 0, 1, 0, 2, 0], [3, 0, 4, 0, 0, 3]])

np.array([np.random.choice(r.nonzero()[0]) for r in a])

which returns an array of the form [x, y, z] where x, y and z are random indexes of non-zero elements from their corresponding rows.

E.g. one result could be:

array([1, 4, 2])

And if you want it to also return the rows, you could just add in a numpy.arrange() call on the length of a to get an array of row numbers:

([np.arange(len(a))], np.array([np.random.choice(r.nonzero()[0]) for r in a]))

so an example random output could be:

([array([0, 1, 2])], array([1, 2, 5]))

for a as:

array([[1, 1, 0, 0, 0, 0],
       [2, 0, 1, 0, 2, 0],
       [3, 0, 4, 0, 0, 3]])

Hope this does what you want now :)