is there is any way to find the meaning of the string is similar or not,,, even though the words in the string are differentiated
Till now i tried fuzzy-wuzzy,levenstein distance,cosine similarity to match the string but all are matches the words not the meaning of the words
Str1 = "what are types of negotiation"
Str2 = "what are advantages of negotiation"
Str3 = "what are categories of negotiation"
Ratio = fuzz.ratio(Str1.lower(),Str2.lower())
Partial_Ratio = fuzz.partial_ratio(Str1.lower(),Str2.lower())
Token_Sort_Ratio = fuzz.token_sort_ratio(Str1,Str2)
Ratio1 = fuzz.ratio(Str1.lower(),Str3.lower())
Partial_Ratio1 = fuzz.partial_ratio(Str1.lower(),Str3.lower())
Token_Sort_Ratio1 = fuzz.token_sort_ratio(Str1,Str3)
print("fuzzywuzzy")
print(Str1," ",Str2," ",Ratio)
print(Str1," ",Str2," ",Partial_Ratio)
print(Str1," ",Str2," ",Token_Sort_Ratio)
print(Str1," ",Str3," ",Ratio1)
print(Str1," ",Str3," ",Partial_Ratio1)
print(Str1," ",Str3," ",Token_Sort_Ratio1)
print("levenshtein ratio")
Ratio = levenshtein_ratio_and_distance(Str1,Str2,ratio_calc = True)
Ratio1 = levenshtein_ratio_and_distance(Str1,Str3,ratio_calc = True)
print(Str1," ",Str2," ",Ratio)
print(Str1," ",Str3," ",Ratio)
output:
fuzzywuzzy
what are types of negotiation what are advantages of negotiation 86
what are types of negotiation what are advantages of negotiation 76
what are types of negotiation what are advantages of negotiation 73
what are types of negotiation what are categories of negotiation 86
what are types of negotiation what are categories of negotiation 76
what are types of negotiation what are categories of negotiation 73
levenshtein ratio
what are types of negotiation what are advantages of negotiation
0.8571428571428571
what are types of negotiation what are categories of negotiation
0.8571428571428571
expected output:
"what are the types of negotiation skill?"
"what are the categories in negotiation skill?"
output:similar
"what are the types of negotiation skill?"
"what are the advantages of negotiation skill?"
output:not similar
You want to score the semantic similarity of two strings.
Fuzzy-wuzzy and Levenshtein distance score only characters distance.
You need to account semantic information. So, you need a semantic representation for your string.
Maybe a simple but effective method consists in:
Surely, there are better and more complex methods. To deeply understand this subject, I suggest this post (https://medium.com/@adriensieg/text-similarities-da019229c894), which is rich of explanations and code implementations.