Suppose I have a matrix that is 100000 x 100
import numpy as np
mat = np.random.randint(2, size=(100000,100))
I wish to go through this matrix, and if each row
contains entirely either 1 or 0 I wish to change a state
variable to that value. If the state is not changed, I wish to set the entire row
the value of state
. The initial value of state
is 0.
Naively in a for
loop this can be done as follows
state = 0
for row in mat:
if set(row) == {1}:
state = 1
elif set(row) == {0}:
state = 0
else:
row[:] = state
However, when the size of the matrix increases this takes an impractical amount of time. Could someone point me in the direction in how to leverage numpy
to vectorize this loop and speed it up?
So for a sample input
array([[0, 1, 0],
[0, 0, 1],
[1, 1, 1],
[0, 0, 1],
[0, 0, 1]])
The expected output in this case would be
array([[0, 0, 0],
[0, 0, 0],
[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
Here is a simple and fast numpy method:
import numpy as np
def pp():
m,n = a.shape
A = a.sum(axis=1)
A = np.where((A==0)|(A==n))[0]
if not A.size:
return np.ones_like(a) if state else np.zeros_like(a)
st = np.concatenate([np.arange(A[0]!=0), A, [m]])
v = a[st[:-1],0]
if A[0]:
v[0] = state
return np.broadcast_to(v.repeat(st[1:]-st[:-1])[:,None],(m,n))
I made some timings using this
state=0
a = (np.random.random((100000,100))<np.random.random((100000,1))).astype(int)
simple test case:
0.8655898020006134 # me
4.089095343002555 # Alain T.
2.2958932030014694 # Divakar 1
2.2178015549980046 # & 2