Consider two arrays of different length:
A = np.array([58, 22, 86, 37, 64])
B = np.array([105, 212, 5, 311, 253, 419, 123, 461, 256, 464])
For each value in A
, I want to find the smallest absolute difference between values in A
and B
. I use Pandas
because my actual arrays are subsets of Pandas dataframes but also because the apply
method is a convenient (albeit slow) approach to taking the difference between two different-sized arrays:
In [22]: pd.Series(A).apply(lambda x: np.min(np.abs(x-B)))
Out[22]:
0 47
1 17
2 19
3 32
4 41
dtype: int64
BUT I also want to keep the sign, so the desired output is:
0 -47
1 17
2 -19
3 32
4 -41
dtype: int64
[update] my actual arrays A
and B
are approximately of 5e4 and 1e6 in length so a low memory solution would be ideal. Also, I wish to avoid using Pandas because it is very slow on the actual arrays.
I couldn't help myself. This is not what you should do! But, it is cute.
[min(x - B, key=abs) for x in A]
[-47, 17, -19, 32, -41]
If N = len(A)
and M = len(B)
then this solution should be O(N + M log(M))
If B
is already sorted, then the sorting step is unnecessary. and this becomes O(N + M)
C = np.sort(B)
a = C.searchsorted(A)
# It is possible that `i` has a value equal to the length of `C`
# in which case the searched value exceeds all those found in `C`.
# In that case, we want to clip the index value to the right most index
# which is `len(C) - 1`
right = np.minimum(a, len(C) - 1)
# For those searched values that are less than the first value in `C`
# the index value will be `0`. When I subtract `1`, we'll end up with
# `-1` which makes no sense. So we clip it to `0`.
left = np.maximum(a - 1, 0)
For clipped values, we'll end up comparing a value to itself and therefore it is safe.
right_diff = A - C[right]
left_diff = A - C[left ]
np.where(np.abs(right_diff) <= left_diff, right_diff, left_diff)
array([-47, 17, -19, 32, -41])