The best way to reduce the calculation time of formulas in nested loops?

Could you tell me what is the best way to reduce the calculation time of formulas in nested loops? I have code like this:

for k in range(0, 130):
    for i in range(0, 1600):
        array2[i,k] = 0
        for j in range(0, 1600):
            array2[i,k] = array2[i,k] + constant_value * (array1[j,k] * function1(i - j))

Array1 and array2 contain double numbers. This block of code executes in 15-20 minutes. The problem is that the block is nested in another loop that has 100 to 400 iterations, so sometimes it takes more than 24 hours to calculate. Is there a way to speed up this piece of code?

Solution

Numpy is designed to vectorize calculations in Python. To figure out how to vectorize a calculation in nested loops like this, you are best to break it down step by step.

Since you didn't provide them, I'm going to use the following for testing my method:

array1 = np.arange(12).astype(float).reshape((3, 4))

def function1(x):
    return 100.0 - x

constant_value = 2

first let's deal with that function1 in the inner two loops:

test = np.full((3, 3), np.nan)
for i in range(3):
    for j in range(3):
        test[i, j] = function1(i - j)

print(test)
# [[100. 101. 102.]
#  [ 99. 100. 101.]
#  [ 98.  99. 100.]]

You can vectorize this with a 2D meshgrid:

jj, ii = np.meshgrid(np.arange(3), np.arange(3))
print(function1(ii - jj))
# [[100. 101. 102.]
#  [ 99. 100. 101.]
#  [ 98.  99. 100.]]

The array1[j, k] contains k which is the iterator of the outer loop.

So for a given k, we can do this:

array1[:, k] * function1(ii - jj)

Let's check that:

k = 0

# Original code
array2 = np.zeros_like(array1)
for i in range(3):
    for j in range(3):
        array2[i, k] = array2[i, k] + (
            constant_value 
            * array1[j, k] 
            * function1(i - j)
        )
print(array2)
# [[2440.    0.    0.    0.]
#  [2416.    0.    0.    0.]
#  [2392.    0.    0.    0.]]

# Vectorized
print(
    np.sum(
        constant_value 
            * array1[:, k] 
            * function1(ii - jj),
        axis=1
    )
)
# [2440. 2416. 2392.]

To vectorize the outer loop we need to use an additional dimension:

print(np.expand_dims(array1.T, 0).shape)
# (1, 4, 3)

Complete solution:

jj, ii = np.meshgrid(np.arange(3), np.arange(3))
array2 = constant_value * np.sum( 
    np.expand_dims(array1.T, 1) * function1(ii - jj), 
    axis=2
).T
print(array2)
# [[2440. 3046. 3652. 4258.]
#  [2416. 3016. 3616. 4216.]
#  [2392. 2986. 3580. 4174.]]

Check it matches the original code:

# Original code
array2 = np.zeros_like(array1)
for k in range(4):
    for i in range(3):
        for j in range(3):
            array2[i, k] = array2[i, k] + (
                constant_value 
                * array1[j, k] 
                * function1(i - j)
            )
print(array2)
# [[2440. 3046. 3652. 4258.]
#  [2416. 3016. 3616. 4216.]
#  [2392. 2986. 3580. 4174.]]

Update

Based on the comment from the question author below, here is an alternative solution that avoids unnecessary evaluation of function1 and also avoids building the mesh grid in memory (which might be large) by iterating over the indices instead. This should be more memory efficient but might be slower due to the for loop.

# Evaluate and store function for unique values only
ni = array1.shape[0]
function_values = function1(np.arange(1 - ni, ni))

def get_function1_value(i, j):
    return function_values[i - j + ni - 1]

array2 = np.zeros_like(array1)
for i, j in np.ndindex((ni, ni)):
    array2[i] = array2[i] + constant_value * array1[j] * get_function1_value(i, j)

print(array2)
# [[2440. 3046. 3652. 4258.]
#  [2416. 3016. 3616. 4216.]
#  [2392. 2986. 3580. 4174.]]

Update 2

I just noticed that you can use a dot product here:

jj, ii = np.meshgrid(np.arange(ni), np.arange(ni))
array2 = constant_value * np.dot(get_function1_value(ii, jj), array1)