How can I improve my custom function vectorization using numpy

I am new to python, and even more new to vectorization. I have attempted to vectorize a custom similarity function that should return a matrix of pairwise similarities between each row in an input array.

IMPORTS:

import numpy as np
from itertools import product
from numpy.lib.stride_tricks import sliding_window_view

INPUT:

np.random.seed(11)

a = np.array([0, 0, 0, 0, 0, 10, 0, 0, 0, 50, 0, 0, 5, 0, 0, 10])
b = np.array([0, 0, 5, 0, 0, 10, 0, 0, 0, 50, 0, 0, 10, 0, 0, 5])
c = np.array([0, 0, 5, 1, 0, 20, 0, 0, 0, 30, 0, 1, 10, 0, 0, 5])

m = np.array((a,b,c))

OUTPUT:

custom_func(m)

array([[   0,  440, 1903],
       [ 440,    0, 1603],
       [1903, 1603,    0]])

FUNCTION:

def custom_func(arr):
    diffs = 0
    max_k = 6
    
    for n in range(1, max_k):

        arr1 = np.array([np.sum(i, axis = 1) for i in sliding_window_view(arr, window_shape = n, axis = 1)])
    
        # this function uses np.maximum and np.minimum to subtract the max and min elements (element-wise) between two rows and then sum up the entire of that subtraction
        diffs += np.sum((np.array([np.maximum(arr1[i[0]], arr1[i[1]]) for i in product(np.arange(len(arr1)), np.arange(len(arr1)))]) - np.array([np.minimum(arr1[i[0]], arr1[i[1]]) for i in product(np.arange(len(arr1)), np.arange(len(arr1)))])), axis = 1) * n
    
    diffs = diffs.reshape(len(arr), -1)
    
    return diffs

The function is quite simple, it sums up the element-wise differences between max and minimum of rows in N sliding windows. This function is much faster than what I was using before finding out about vectorization today (for loops and pandas dataframes yay).

My first thought is to figure out a way to find both the minimum and maximum of my arrays in a single pass since I currently THINK it has to do two passes, but I was unable to figure out how. Also there is a for loop in my current function because I need to do this for multiple N sliding windows, and I am not sure how to do this without the loop.

Any help is appreciated!

Solution

Here are the several optimizations you can apply on the code:

use the Numba's JIT to speed up the computation and replace the product call with nested loops
use a more efficient sliding window algorithm (better complexity)
avoid to compute multiple time product and arrange in the loop
reduce the number of implicit temporary arrays allocated (and array Numpy calls)
do not compute the lower triangular part of diffs since it will always be symmetric
(just copy the upper triangular part)
use integer-based indexing rather than slow slow floating-point one

Here is the resulting code:

import numpy as np
from itertools import product
from numpy.lib.stride_tricks import sliding_window_view
import numba as nb

@nb.njit
def custom_func_fast(arr):
    h, w = arr.shape[0], arr.shape[1]
    diffs = np.zeros((h, h), dtype=arr.dtype)
    max_k = 6

    for n in range(1, max_k):
        arr1 = np.empty(shape=(h, w-n+1), dtype=arr.dtype)

        for i in range(h):
            # Efficient sliding window algorithm
            assert w >= n
            s = np.sum(arr[i, 0:n])
            arr1[i, 0] = s
            for j in range(n, w):
                s -= arr[i, j-n]
                s += arr[i, j]
                arr1[i, j-n+1] = s

        # Efficient distance matrix computation
        for i in range(h):
            for j in range(i+1, h):
                s = 0
                for k in range(w-n+1):
                    s += np.abs(arr1[i,k] - arr1[j,k])
                diffs[i, j] += s * n

    # Fill the lower triangular part
    for i in range(h):
        for j in range(i):
            diffs[i, j] = diffs[j, i]

    return diffs

The resulting code is 290 times faster on the example input array on my machine.