Let's say I have two python lists like so:
[30, 400, 500]
[55, 396, 478]
I want to find the sum of minimum (absolute value of the) difference between the elements. In this case it would be easy: (55-30) + (400-396) + (500-478) = 51
But how would I go about doing this efficiently when the lists don't have an equal number of elements. For example:
Set 1:
list1 = [30, 400, 500]
list2 = [412, 489]
or even if it was
Set 2
list1 = [30, 400, 500]
list2 = [24, 563]
lastly,
Set 3
list1 = [30, 50]
list2 = [20, 31, 90]
For Set 1, the answer would be (412-400) + (500-489) = 23
For Set 2, the answer would be (30-24) + (563-500) = 69
For Set 3, the answer would be (30-20) + (50-31) =29
I can't compare by element. In set 1, the sum of the minimum difference is achieved by comparing the second element of list1 to the first element of list2, and the third element of list1 to the second element of list2. In set 2, the sum of the minimum difference is achieved by comparing the first element of list1 to the first element of list2, and the third element of list1 to the second element of list2.
Any help is appreciated.
Some other info:
In order to be sure to get the right answer I would use a bipartite weighted matching, where the abs-diff between each pair are the weights. This will avoid all the pitfalls from sorting based approaches, such as
list1=[30, 50], list2=[20, 31, 90], ans= 29
where most intuitve algorithms would pair 30 with 31. (giving sum 41)
Here is a solution using scipy's linear_sum_assignment
:
import numpy as np
from scipy.optimize import linear_sum_assignment
def min_diff_sum(list1, list2):
arr1 = np.asanyarray(list1)
arr2 = np.asanyarray(list2)
cost_matrix = np.abs(arr1-arr2[:, None])
pairs = linear_sum_assignment(cost_matrix)
return np.sum(cost_matrix[pairs])
This should always give the right result.
In [45]: min_diff_sum([30, 400, 500], [412, 489])
Out[45]: 23
In [46]: min_diff_sum([30, 400, 500], [24, 563])
Out[46]: 69