Search code examples
pythonregexstring-matchinglevenshtein-distancefuzzy-search

Matching two string that contain same words from left to right in Python


I am trying to find a way to match two strings to see if they are a match or at similar in python.

Example:

from fuzzywuzzy import fuzz

string1 = 'Green apple'
string2 = 'Apple, green' 
string3 = 'Green apples - grow on trees'

#Test with Fuzzy Wuzzy
print(fuzz.partial_ratio(string1, string2))
> 50
print(fuzz.partial_ratio(string1, string3))
> 100
print(fuzz.partial_ratio(string2, string3))
> 58

#Testing with DiffLib SequenceMatcher
print(difflib.SequenceMatcher(None, string1, string2).ratio())
> 0.34782608695652173
print(difflib.SequenceMatcher(None, string1, string3).ratio())
> 0.5641025641025641
print(difflib.SequenceMatcher(None, string2, string3).ratio())
> 0.45

In the example above, all three strings should be similar as they each contain the same word green apple. Is there any matching algorithm that can match a string containing the same words regardless of the sequence and match from left to right and disregard words that come after after it found a match like string 1 and string 3.


Solution

  • There is another method in fuzzywuzzy called partial_token_set_ratio. I think this will solve your problem

    from fuzzywuzzy import fuzz
    string1 = 'Green apple'
    string2 = 'Apple, green' 
    string3 = 'Green apples - grow on trees'
    fuzz.partial_token_set_ratio(string1,string3)
    100
    fuzz.partial_token_set_ratio(string1,string2)
    100
    string4="apple"
    fuzz.partial_token_set_ratio(string1,string4)
    100
    fuzz.partial_token_set_ratio(string4,string1)
    100
    string4="app"
    fuzz.partial_token_set_ratio(string4,string1)
    100
    string4="appld"
    fuzz.partial_token_set_ratio(string4,string1)
    80