I have the following Python snippet to count the number of times the elements x
and y
(from X
and Y
) verify the conditions x<=i
and y<=j
concomitantly, where i
and j
are indices:
import numpy as np
X = np.array([ 8, 7, 9, 5, 3, 2, 4, 10, 6, 1])
Y = np.array([ 1, 3, 10, 9, 5, 7, 8, 2, 6, 4])
n = len(X)
count = 0
expected_count = 0
for i in range(n):
for j in range(n):
count += np.sum((X <= (i+1)) & (Y <= (j+1)))
expected_count += (i + 1) * (j + 1) / n
It works fine, but for performance reasons, I'd like to vectorize this operation.
I know that
i_ = np.arange(1, n+1)
j_ = np.arange(1, n+1)
m_ = i_[:, None] * j_[None, :]
Produces a n
xn
matrix m_
whose elementwise sum will return expected_count
, and seems to be a way to go for count
too. But I lack some more NumPy knowledge to implement the condition.
How can I achieve this, particularly vectorize the (X <= (i+1)) & (Y <= (j+1))
operation in this context?
You can use:
count = ((X <= i_[:,None]) & (Y <= j_[:,None])[:,None]).sum()
count
285
expected_count = m_.sum()/n
302.5
It seems there is a mathematical relation to what you are doing:
Try
((n + 1 - Y) * (n + 1 - X)).sum()
285
Simplifying the equation above, and taking the fact that Y and X contains the values 1...n, we end up with the expression sum(X * Y)
sum(X * Y)
285
X @ Y
285
Also note that the expected can be easily calculated using mathematical formula. ie sum(X)* sum(Y)/n = [n(n+1)/2]**2/n = n(n+1)**2/4
n*(n + 1)**2/4
302.5