python performance numpy coding-efficiency

How to optimise multi dimension numpy array calculation?

Given a 5D array, the objective is to calculate the difference between the extracted two arrays. For simplicity, say to measure the difference at the second position which can be denoted as bt and lf. The value from these two array can be extracted as follows:

arr[ep, bt, mt, bd, :] - arr[ep, lf, mt, bd, :]

Note that in the above, the index for the first (ep), third (mt) and fourth (bd) axes are the same for both of the arrays, with only the position index of second axis differing (bt and lf).

Based on this requirement, the following code was proposed, and pack under the function nested_for_loop:

import numpy as np
from joblib import Parallel, delayed
np.random.seed(0)

ub_lb_pair = np.tril_indices (5, -1)

arr = np.random.randn(3, 5, 4, 3, 2)
my_shape = arr.shape

def nested_for_loop():
    store_cval = np.full([my_shape[0], 10, my_shape[2], my_shape[3], my_shape[4]],
                         np.nan)  # preallocate
    for ep in range(0, my_shape[0]):
        for mt in range(0, my_shape[2]):
            for bd in range(0, my_shape[3]):
                for idx,(bt, lf) in enumerate(zip(ub_lb_pair[0], ub_lb_pair[1])):
                    store_cval[ep, idx, mt, bd, :] = arr[ep, bt, mt, bd, :] - \
                                                     arr[ep, lf, mt, bd, :]
    return store_cval


store_cval = nested_for_loop()

However, I would like to make the code much more compact and efficient if possible.

One approach I can think of is take advantage of the joblib parallel module, which can be achieve as below as shown under the function multi_prop.

def multi_prop(my_arr, ep):
    temp_ = np.full([10, my_shape[2], my_shape[3], my_shape[4]],
                    np.nan)
    for mt in range(0, my_shape[2]):
        for bd in range(0, my_shape[3]):
            for idx, (bt, lf) in enumerate(zip(ub_lb_pair[0], ub_lb_pair[1])):
                temp_[idx, mt, bd, :] = my_arr[ep, bt, mt, bd, :] - my_arr[ep, lf, mt, bd, :]
                x = 1
    return  temp_

dist_abs = Parallel(n_jobs=-1)(delayed(multi_prop)(arr, ep) for ep in range(0, my_shape[0]))

dist_abs = np.array(dist_abs)
bb = np.array_equal(store_cval, dist_abs)

But, I wonder whether the is a more numpythonic way to achieve the same objective.

Solution

You don't really need any loops at all. Imagine this pair of fancy indices:

bt, lf = np.tril_indices (5, -1)

You are looking for

store_cval = arr[:, bt] - arr[:, lf]

Keep in mind that store_cval[ep, idx, mt, bd, :] = arr[ep, bt, mt, bd, :] - arr[ep, lf, mt, bd, :] is an implicit loop over the last index. They're all loops, and you don't need any of them over the hood.

A more general solution:

def diffs(arr, axis):
    a, b = np.tril_indices(arr.shape[axis], -1)
    ind1 = [slice(None) for _ in range(arr.ndim)]
    ind2 = ind1.copy()
    ind1[axis] = a
    ind2[axis] = b
    return arr[tuple(ind1)] - arr[tuple(ind2)]