Search code examples
pythonlistcomparisondifference

Minimum summation of difference of terms in two lists


Let's say I have two python lists like so:

[30, 400, 500]

[55, 396, 478]

I want to find the sum of minimum (absolute value of the) difference between the elements. In this case it would be easy: (55-30) + (400-396) + (500-478) = 51

But how would I go about doing this efficiently when the lists don't have an equal number of elements. For example:

Set 1:

list1 = [30, 400, 500]

list2 = [412, 489]

or even if it was

Set 2

list1 = [30, 400, 500]

list2 = [24, 563]

lastly,

Set 3

list1 = [30, 50]

list2 = [20, 31, 90]

For Set 1, the answer would be (412-400) + (500-489) = 23

For Set 2, the answer would be (30-24) + (563-500) = 69

For Set 3, the answer would be (30-20) + (50-31) =29

I can't compare by element. In set 1, the sum of the minimum difference is achieved by comparing the second element of list1 to the first element of list2, and the third element of list1 to the second element of list2. In set 2, the sum of the minimum difference is achieved by comparing the first element of list1 to the first element of list2, and the third element of list1 to the second element of list2.

Any help is appreciated.

Some other info:

  • The lists will never be more than 2 times longer than the other, but there is no bound on whether list1 is the bigger list or if list2 is the bigger list.
  • The lists will be in sorted order
  • All elements in the shorter list have to be used at least once

Solution

  • In order to be sure to get the right answer I would use a bipartite weighted matching, where the abs-diff between each pair are the weights. This will avoid all the pitfalls from sorting based approaches, such as

    list1=[30, 50], list2=[20, 31, 90], ans= 29
    

    where most intuitve algorithms would pair 30 with 31. (giving sum 41)

    Here is a solution using scipy's linear_sum_assignment:

    import numpy as np
    from scipy.optimize import linear_sum_assignment
    def min_diff_sum(list1, list2):
        arr1 = np.asanyarray(list1)
        arr2 = np.asanyarray(list2)
        cost_matrix = np.abs(arr1-arr2[:, None])
        pairs = linear_sum_assignment(cost_matrix)
        return np.sum(cost_matrix[pairs])
    

    This should always give the right result.

    In [45]: min_diff_sum([30, 400, 500], [412, 489])
    Out[45]: 23
    
    In [46]: min_diff_sum([30, 400, 500], [24, 563])
    Out[46]: 69