Assuming we got 2 lists, always with the same length and always containing strings.
list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'gg', 'sot']
we need to find:
How many items of the list2
should change, in order for it to be equals with list1
.
So on the previous example it should return 2
For this example:
list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'sot', 'sot']
it should return 1
and finally for this example:
list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['ts', 'ts', 'ts', 'ts', 'ts', 'ts']
it should return 5
.
We do not care about which elements should change to what. We neither care about the order, so that means that
['gg', 'gg', 'gg', 'gg', 'gg', 'sot']
and
['gg', 'gg', 'sot', 'gg', 'gg', 'gg']
are equal and the result of them should be 0.
The length of the lists could be 6, 8, 20 or whatever and sometimes there are more elements in place.
I tried a lot of things like set(list1) - set(list2)
,list(set(list1).difference(list2))
, set(list1).symmetric_difference(set(list2))
but without any success.
You could leverage the many possibilities Counter
offers:
list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'gg', 'sot']
from collections import Counter
sum((Counter(list1) - Counter(list2)).values())
# 2
Lets check with the other examples:
list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['gg', 'gg', 'gg', 'gg', 'sot', 'sot']
sum((Counter(list1) - Counter(list2)).values())
# 1
list1 = ['sot', 'sot', 'ts', 'gg', 'gg', 'gg']
list2 = ['ts', 'ts', 'ts', 'ts', 'ts', 'ts']
sum((Counter(list1) - Counter(list2)).values())
# 5
list1 = ['gg', 'gg', 'gg', 'gg', 'gg', 'sot']
list2 = ['gg', 'gg', 'sot', 'gg', 'gg', 'gg']
sum((Counter(list1) - Counter(list2)).values())
# 0
Details
By using Counter
, you will have a count of all elements from each list in the form of a dictionary. Lets go back to the first example:
c1 = Counter(list1)
# Counter({'sot': 2, 'ts': 1, 'gg': 3})
c2 = Counter(list2)
# Counter({'gg': 5, 'sot': 1})
Now we somehow would like to get an understanding of:
Which items are present in list1
but not in list2
Out of those that are present and also those there are not, how many more are needed in list2
so that they contain the same amount of counts
Well we could take advantage of the fact that counters support mathematical operations, the result of which produces multisets
, i.e counters that have counts greater than zero. So given that we're looking for the difference between both counters it seems like we could subtract them and see what elements and their respective counts are needed in list2
.
So how would subtraction between Counters work? Lets check with a simple example:
Counter({1:4, 2: 1}) - Counter({1:1, 3:1})
# Counter({1: 3, 2: 1})
So what this doing is subtracting the counts of corresponding elements, so the elements contained in the first counter, thus order here is important. So going back to the proposed example subtracting both lists would yield:
sub = Counter(list1) - Counter(list2)
# Counter({'sot': 1, 'ts': 1})
Now we simple need to count the values
in all the keys
, which can be done with:
sum(sub.values())
# 2