Search code examples
pythonlistloopsiterationenumerate

Python - Problem with loops while comparing two lists


I have a little problem, I am trying to compare 2 lists with words in it to establish a similarity percentage but here is the thing, if I have the same word 2 times in each lists, I get a falsied percentage.

First I made this little script :

data1 = ['test', 'super', 'class', 'test', 'boom']
data2 = ['test', 'super', 'class', 'test', 'boom']
res = 0
nb = (len(data1) + len(data2)) / 2
if data1 and data2 and nb != 0:
    for id1, item1 in enumerate(data1):
        for id2, item2 in enumerate(data2):
            if item1 == item2:
                res += 1 - abs(id1 - id2) / nb
    print(res / nb * 100)

The problem is that if i have 2 time the same word in the lists the percentage will be greater than 100%. So to counter that, i added a 'break' just after the line 'res += 1 - abs(id1 - id2) / nb' but the percentage is still falsified.

I hope you've understand my problem, thanks you for your help !


Solution

  • You can use difflib.SequenceMatcher instead to compare the similarity of two lists. Try this :

    from difflib import SequenceMatcher as sm
    data1 = ['test', 'super', 'class', 'test', 'boom']
    data2 = ['test', 'super', 'class', 'test', 'boom']
    matching_percentage = sm(None, data1, data2).ratio() * 100
    

    Output :

    100.0