Search code examples
pythonnumpyconditional-statementsvectorization

Vectorization of nested for loops with condition in NumPy


I have the following Python snippet to count the number of times the elements x and y (from X and Y) verify the conditions x<=i and y<=j concomitantly, where i and j are indices:

import numpy as np

X = np.array([ 8,  7,  9,  5,  3,  2,  4, 10,  6,  1])
Y = np.array([ 1,  3, 10,  9,  5,  7,  8,  2,  6,  4])

n = len(X)

count = 0
expected_count = 0

for i in range(n):
    for j in range(n):
        count += np.sum((X <= (i+1)) & (Y <= (j+1)))
        expected_count += (i + 1) * (j + 1) / n

It works fine, but for performance reasons, I'd like to vectorize this operation.

I know that

i_ = np.arange(1, n+1)
j_ = np.arange(1, n+1)
m_ = i_[:, None] * j_[None, :]

Produces a nxn matrix m_ whose elementwise sum will return expected_count, and seems to be a way to go for count too. But I lack some more NumPy knowledge to implement the condition.

How can I achieve this, particularly vectorize the (X <= (i+1)) & (Y <= (j+1)) operation in this context?


Solution

  • You can use:

    count = ((X <= i_[:,None]) & (Y  <= j_[:,None])[:,None]).sum()
    count
    285
    
    expected_count = m_.sum()/n
    302.5
    

    It seems there is a mathematical relation to what you are doing:

    Try

    ((n + 1 - Y) * (n + 1 - X)).sum()
    285
    

    Simplifying the equation above, and taking the fact that Y and X contains the values 1...n, we end up with the expression sum(X * Y)

    sum(X * Y)
    285
    
    X @ Y
    285
    

    Also note that the expected can be easily calculated using mathematical formula. ie sum(X)* sum(Y)/n = [n(n+1)/2]**2/n = n(n+1)**2/4

    n*(n + 1)**2/4
    302.5