I have two sorted numerical arrays of unequal length and I'm looking for a way to match between elements in both arrays such that there's a one-to-one match between (most) gt
and pred
elements. A match is only considered valid if it is below some threshold thrsh
(so there could be unmatched gt
elements), and we define a cost
of a match as the difference between matched gt
and pred
elements.
From here we can define three types of matches:
gt
element, as shown in this answer.gt
elements as possible, regardless of the "cost" of matching.For example:
thrsh=3
gt = [4,8,15,16,23,42,45]
pred = [4,5,7,16,19,44]
# some matching voodoo...
# first_match = [(4,4),(8,5),(15,16),(16,19),(42,44)] # gt23, gt45 and pred5 are unmatched
# greedy_match is the same as first_match
# min_cost_match = [(4,4),(8,7),(16,16),(45,44)] # gt15 and pred19 now unmatched as well
I believe First Match and Greedy Match are always the same (up to the last matched pair), and as mentioned, there's an implementation of First Match here. However, I can't find a way to implement the Min Cost Match, as any iteration may also change matches for previous iteration (depending on how large thrsh
is, this could be very computationally heavy).
Seems like a DP-ish problem that can be solved if you make an MxN table, and do some LCS-like traversal. In numpy, you can do this:
thrsh=3
gt = np.array([4,8,15,16,23,42,45])
pred = np.array([4,5,7,16,19,44])
table = abs(gt[:, None] - pred[None, :])
# array([[ 0, 1, 3, 12, 15, 40],
# [ 4, 3, 1, 8, 11, 36],
# [11, 10, 8, 1, 4, 29],
# [12, 11, 9, 0, 3, 28],
# [19, 18, 16, 7, 4, 21],
# [38, 37, 35, 26, 23, 2],
# [41, 40, 38, 29, 26, 1]])
It is easy to observe the pattern here. Any item which is the minimum cost both row-wise and column-wise (and is also within the threshold), must be a part of the min cost match.
rowwise = np.stack([table.argmin(0), np.arange(table.shape[1])]).T
colwise = np.stack([np.arange(table.shape[0]), table.argmin(1)]).T
rc = (rowwise[:, None] == colwise).all(-1).any(1)
idxs = rowwise[rc]
out = np.stack([gt[idxs[:, 0]], pred[idxs[:, 1]]]).T
# array([[ 4, 4],
# [ 8, 7],
# [16, 16],
# [45, 44]])
After you obtain out
, you can then maybe filter it to see if any match is over the threshold.