Search code examples
pythonarraysnumpyvectorization

Difference between all elements in a row across two 2d NumPy arrays?


I'm relatively new to programming/Python/NumPy, so I apologize if my ignorance is showing...

Say I have two 2d NumPy arrays:

import numpy as np

a = np.array([[1,   5,              20],
              [2,   15,             float("NaN")],
              [4,   float("NaN"),   float("NaN")]])
b = np.array([[4,   float("NaN")],
              [1,   13],
              [5,   float("NaN")]])

I would like to find the differences between all elements of the first row of each matrix, differences between all elements of the second row of each matrix, and so on. An example of a desired result given the above matrices is

[[  3.  nan  -1.  nan -16.  nan]  # <-- differences between all elements b[0] and a[0]
 [ -1.  11. -14.  -2.  nan  nan]  # <-- differences between all elements b[1] and a[1]
 [  1.  nan  nan  nan  nan  nan]] # <-- differences between all elements b[2] and a[2]

A way of obtaining the above result while using a loop is

outcome = []
for a_row, b_row in zip(a, b):
    outcome.append((b_row - a_row[:,None]).flatten())
outcome = np.array(outcome)

However, is there a way about this without looping through the rows that would be faster?


Solution

  • You can transpose the array to avoid the shape differences when substructing, and then ravel and reshape the results

    arr = b.T[:, None] - a.T
    np.ravel([arr[0], arr[1]], 'F').reshape((a.shape[0], a.shape[1] * b.shape[1]))
    

    Output

    [[  3.  nan  -1.  nan -16.  nan]
     [ -1.  11. -14.  -2.  nan  nan]
     [  1.  nan  nan  nan  nan  nan]]