Search code examples
pythonpandasnumpydifference

Find absolute minimum between two numpy arrays but keep the sign


Consider two arrays of different length:

A = np.array([58, 22, 86, 37, 64])

B = np.array([105, 212,   5, 311, 253, 419, 123, 461, 256, 464])

For each value in A, I want to find the smallest absolute difference between values in A and B. I use Pandas because my actual arrays are subsets of Pandas dataframes but also because the apply method is a convenient (albeit slow) approach to taking the difference between two different-sized arrays:

In [22]: pd.Series(A).apply(lambda x: np.min(np.abs(x-B)))
Out[22]:
0    47
1    17
2    19
3    32
4    41
dtype: int64

BUT I also want to keep the sign, so the desired output is:

0    -47
1     17
2    -19
3     32
4    -41
dtype: int64

[update] my actual arrays A and B are approximately of 5e4 and 1e6 in length so a low memory solution would be ideal. Also, I wish to avoid using Pandas because it is very slow on the actual arrays.


Solution

  • Comprehension

    I couldn't help myself. This is not what you should do! But, it is cute.

    [min(x - B, key=abs) for x in A]
    
    [-47, 17, -19, 32, -41]
    

    Reduced Big-O Numpy solution

    If N = len(A) and M = len(B) then this solution should be O(N + M log(M)) If B is already sorted, then the sorting step is unnecessary. and this becomes O(N + M)

    C          = np.sort(B)
    a          = C.searchsorted(A)
    
    # It is possible that `i` has a value equal to the length of `C`
    # in which case the searched value exceeds all those found in `C`.
    # In that case, we want to clip the index value to the right most index
    # which is `len(C) - 1`
    right      = np.minimum(a, len(C) - 1)
    
    # For those searched values that are less than the first value in `C`
    # the index value will be `0`.  When I subtract `1`, we'll end up with
    # `-1` which makes no sense.  So we clip it to `0`.
    left       = np.maximum(a - 1, 0)
    

    For clipped values, we'll end up comparing a value to itself and therefore it is safe.

    right_diff = A - C[right]
    left_diff  = A - C[left ]
    
    np.where(np.abs(right_diff) <= left_diff, right_diff, left_diff)
    
    array([-47,  17, -19,  32, -41])