I am trying to find a way to match two strings to see if they are a match or at similar in python.
Example:
from fuzzywuzzy import fuzz
string1 = 'Green apple'
string2 = 'Apple, green'
string3 = 'Green apples - grow on trees'
#Test with Fuzzy Wuzzy
print(fuzz.partial_ratio(string1, string2))
> 50
print(fuzz.partial_ratio(string1, string3))
> 100
print(fuzz.partial_ratio(string2, string3))
> 58
#Testing with DiffLib SequenceMatcher
print(difflib.SequenceMatcher(None, string1, string2).ratio())
> 0.34782608695652173
print(difflib.SequenceMatcher(None, string1, string3).ratio())
> 0.5641025641025641
print(difflib.SequenceMatcher(None, string2, string3).ratio())
> 0.45
In the example above, all three strings should be similar as they each contain the same word green apple. Is there any matching algorithm that can match a string containing the same words regardless of the sequence and match from left to right and disregard words that come after after it found a match like string 1 and string 3.
There is another method in fuzzywuzzy
called partial_token_set_ratio
. I think this will solve your problem
from fuzzywuzzy import fuzz
string1 = 'Green apple'
string2 = 'Apple, green'
string3 = 'Green apples - grow on trees'
fuzz.partial_token_set_ratio(string1,string3)
100
fuzz.partial_token_set_ratio(string1,string2)
100
string4="apple"
fuzz.partial_token_set_ratio(string1,string4)
100
fuzz.partial_token_set_ratio(string4,string1)
100
string4="app"
fuzz.partial_token_set_ratio(string4,string1)
100
string4="appld"
fuzz.partial_token_set_ratio(string4,string1)
80