Search code examples
pythonpython-3.xlistmultiset

Python3 find how many differences are in 2 lists in order to be equal


Assuming we got 2 lists, always with the same length and always containing strings.

list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'gg', 'sot']

we need to find:

How many items of the list2 should change, in order for it to be equals with list1.

So on the previous example it should return 2

For this example:

list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'sot', 'sot']

it should return 1

and finally for this example:

list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['ts', 'ts', 'ts', 'ts', 'ts', 'ts']

it should return 5.

We do not care about which elements should change to what. We neither care about the order, so that means that

['gg', 'gg', 'gg', 'gg', 'gg', 'sot'] 
and
['gg', 'gg', 'sot', 'gg', 'gg', 'gg']

are equal and the result of them should be 0.

The length of the lists could be 6, 8, 20 or whatever and sometimes there are more elements in place.

I tried a lot of things like set(list1) - set(list2) ,list(set(list1).difference(list2)) , set(list1).symmetric_difference(set(list2)) but without any success.


Solution

  • You could leverage the many possibilities Counter offers:

    list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
    list2 = ['gg', 'gg', 'gg', 'gg', 'gg', 'sot']
    
    from collections import Counter
    
    sum((Counter(list1) - Counter(list2)).values())
    # 2
    

    Lets check with the other examples:

    list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
    list2 = ['gg', 'gg', 'gg', 'gg', 'sot', 'sot']
    
    sum((Counter(list1) - Counter(list2)).values())
    # 1
    
    list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
    list2 = ['ts', 'ts', 'ts', 'ts', 'ts', 'ts']
    
    sum((Counter(list1) - Counter(list2)).values())
    # 5
    
    list1 = ['gg', 'gg', 'gg', 'gg', 'gg', 'sot'] 
    list2 = ['gg', 'gg', 'sot', 'gg', 'gg', 'gg']
    
    sum((Counter(list1) - Counter(list2)).values())
    # 0
    

    Details

    By using Counter, you will have a count of all elements from each list in the form of a dictionary. Lets go back to the first example:

    c1 = Counter(list1)
    # Counter({'sot': 2, 'ts': 1, 'gg': 3})
    
    c2 = Counter(list2)
    # Counter({'gg': 5, 'sot': 1})
    

    Now we somehow would like to get an understanding of:

    • Which items are present in list1 but not in list2

    • Out of those that are present and also those there are not, how many more are needed in list2 so that they contain the same amount of counts

    Well we could take advantage of the fact that counters support mathematical operations, the result of which produces multisets, i.e counters that have counts greater than zero. So given that we're looking for the difference between both counters it seems like we could subtract them and see what elements and their respective counts are needed in list2.

    So how would subtraction between Counters work? Lets check with a simple example:

    Counter({1:4, 2: 1}) - Counter({1:1, 3:1})  
    # Counter({1: 3, 2: 1})
    

    So what this doing is subtracting the counts of corresponding elements, so the elements contained in the first counter, thus order here is important. So going back to the proposed example subtracting both lists would yield:

     sub = Counter(list1) - Counter(list2)
    # Counter({'sot': 1, 'ts': 1})
    

    Now we simple need to count the values in all the keys, which can be done with:

    sum(sub.values())
    # 2