Search code examples
pythonnumpyvectorization

How can I improve my custom function vectorization using numpy


I am new to python, and even more new to vectorization. I have attempted to vectorize a custom similarity function that should return a matrix of pairwise similarities between each row in an input array.

IMPORTS:

import numpy as np
from itertools import product
from numpy.lib.stride_tricks import sliding_window_view

INPUT:

np.random.seed(11)

a = np.array([0, 0, 0, 0, 0, 10, 0, 0, 0, 50, 0, 0, 5, 0, 0, 10])
b = np.array([0, 0, 5, 0, 0, 10, 0, 0, 0, 50, 0, 0, 10, 0, 0, 5])
c = np.array([0, 0, 5, 1, 0, 20, 0, 0, 0, 30, 0, 1, 10, 0, 0, 5])

m = np.array((a,b,c))

OUTPUT:

custom_func(m)

array([[   0,  440, 1903],
       [ 440,    0, 1603],
       [1903, 1603,    0]])

FUNCTION:

def custom_func(arr):
    diffs = 0
    max_k = 6
    
    for n in range(1, max_k):

        arr1 = np.array([np.sum(i, axis = 1) for i in sliding_window_view(arr, window_shape = n, axis = 1)])
    
        # this function uses np.maximum and np.minimum to subtract the max and min elements (element-wise) between two rows and then sum up the entire of that subtraction
        diffs += np.sum((np.array([np.maximum(arr1[i[0]], arr1[i[1]]) for i in product(np.arange(len(arr1)), np.arange(len(arr1)))]) - np.array([np.minimum(arr1[i[0]], arr1[i[1]]) for i in product(np.arange(len(arr1)), np.arange(len(arr1)))])), axis = 1) * n
    
    diffs = diffs.reshape(len(arr), -1)
    
    return diffs

The function is quite simple, it sums up the element-wise differences between max and minimum of rows in N sliding windows. This function is much faster than what I was using before finding out about vectorization today (for loops and pandas dataframes yay).

My first thought is to figure out a way to find both the minimum and maximum of my arrays in a single pass since I currently THINK it has to do two passes, but I was unable to figure out how. Also there is a for loop in my current function because I need to do this for multiple N sliding windows, and I am not sure how to do this without the loop.

Any help is appreciated!


Solution

  • Here are the several optimizations you can apply on the code:

    • use the Numba's JIT to speed up the computation and replace the product call with nested loops
    • use a more efficient sliding window algorithm (better complexity)
    • avoid to compute multiple time product and arrange in the loop
    • reduce the number of implicit temporary arrays allocated (and array Numpy calls)
    • do not compute the lower triangular part of diffs since it will always be symmetric
      (just copy the upper triangular part)
    • use integer-based indexing rather than slow slow floating-point one

    Here is the resulting code:

    import numpy as np
    from itertools import product
    from numpy.lib.stride_tricks import sliding_window_view
    import numba as nb
    
    @nb.njit
    def custom_func_fast(arr):
        h, w = arr.shape[0], arr.shape[1]
        diffs = np.zeros((h, h), dtype=arr.dtype)
        max_k = 6
    
        for n in range(1, max_k):
            arr1 = np.empty(shape=(h, w-n+1), dtype=arr.dtype)
    
            for i in range(h):
                # Efficient sliding window algorithm
                assert w >= n
                s = np.sum(arr[i, 0:n])
                arr1[i, 0] = s
                for j in range(n, w):
                    s -= arr[i, j-n]
                    s += arr[i, j]
                    arr1[i, j-n+1] = s
    
            # Efficient distance matrix computation
            for i in range(h):
                for j in range(i+1, h):
                    s = 0
                    for k in range(w-n+1):
                        s += np.abs(arr1[i,k] - arr1[j,k])
                    diffs[i, j] += s * n
    
        # Fill the lower triangular part
        for i in range(h):
            for j in range(i):
                diffs[i, j] = diffs[j, i]
    
        return diffs
    

    The resulting code is 290 times faster on the example input array on my machine.