Search code examples
pythonnumpymultidimensional-arrayscipysubtraction

python pairwise subtract two elements in multi-dimensional array consisted of vectors


I wonder if there are very simple ways to calculate pairwise subtraction for two elements in a multi-dimensional array which is consisted of vectors USING a function in NUMPY or SCIPY library.

Let me introduce an example:

>>> a = (np.arange(9)).reshape((3,3)) # a list of 3 vectors (x, y, z)

>>> a
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

I want to get the following:

>>>> result
array([[3,3,3],
       [6,6,6],
       [3,3,3]])
# first element comes from [3,4,5] - [0,1,2]
# second element comes from [6,7,8] - [0,1,2]
# third element comes from [6,7,8] - [3,4,5]

I do not matter about the signs (+/-) on the result which depends on the order of subtraction of two vectors. But, I want to know VERY SIMPLE version of code using pre-defined functions in Scipy or Numpy libraries such as scipy.spatial.distance.pdist.

I do need loop codes to iterate elements-wise for result, instead, I need just one line to get the result.


Solution

  • Approach #1

    Get the pairwise indices with np.triu_indices, index into rows of a with those and compute the differences -

    In [8]: r,c = np.triu_indices(len(a),1)
    
    In [9]: a[c] - a[r]
    Out[9]: 
    array([[3, 3, 3],
           [6, 6, 6],
           [3, 3, 3]])
    

    Approach #2

    We can also use slicing that avoids creating the indices and the indexing part itself that creates copies of the input array for the slices needed for subtraction. Thus, we would work with views only, but we need to iterate in that process. The advantage of slicing shows off for large arrays on performance, as we would verify later on in timings. The implementation would be -

    n = len(a)
    N = n*(n-1)//2
    idx = np.concatenate(( [0], np.arange(n-1,0,-1).cumsum() ))
    start, stop = idx[:-1], idx[1:]
    out = np.empty((N,a.shape[1]),dtype=a.dtype)
    for j,i in enumerate(range(n-1)):
        out[start[j]:stop[j]] = a[i+1:] - a[i,None]
    

    Runtime test

    Approaches as funcs -

    def pairwise_row_diff_triu_indices(a):
        r,c = np.triu_indices(len(a),1)
        out = a[c] - a[r]
        return out
    
    def pairwise_row_diff_slicing(a):
        n = len(a)
        N = n*(n-1)//2
        idx = np.concatenate(( [0], np.arange(n-1,0,-1).cumsum() ))
        start, stop = idx[:-1], idx[1:]
        out = np.empty((N,a.shape[1]),dtype=a.dtype)
        for j,i in enumerate(range(n-1)):
            out[start[j]:stop[j]] = a[i+1:] - a[i,None]
        return out
    

    Timings -

    In [53]: np.random.seed(0)
    
    In [54]: a = np.random.randint(0,9,(1000,3))
    
    In [55]: %timeit pairwise_row_diff_triu_indices(a)
        ...: %timeit pairwise_row_diff_slicing(a)
    10 loops, best of 3: 21 ms per loop
    100 loops, best of 3: 6.01 ms per loop
    
    In [56]: a = np.random.randint(0,9,(5000,3))
    
    In [57]: %timeit pairwise_row_diff_triu_indices(a)
        ...: %timeit pairwise_row_diff_slicing(a)
    1 loop, best of 3: 456 ms per loop
    10 loops, best of 3: 110 ms per loop