Search code examples
pythonlevenshtein-distancefuzzywuzzy

Why is this fuzz.ratio giving me 25 when none of the characters match?


I'm trying to work through how fuzzywuzzy calculates this simple fuzz ratio:

print(fuzz.ratio("66155347", "12026599"))
25

Why is the fuzz ratio not 0 since they are completely different characters in every position?

The Levenshtein Distance = 8 (because every value needs to be substituted) a is 8 (length of string 1 is 8) b is 8 (length of string 2 is 8)

fuzz.ratio is (a+b - Levenshtein Distance)/(a+b)

fuzz.ratio is (8+8 - 8)/(8+8) = .50

fuzz.ratio is 50

There also must be something wrong with my math; I'm getting 50.

How does the fuzz ratio arrive at 25?

Any guidance would be appreciated.

Thanks


Solution

  • The fuzzywuzzy library uses a weighted version of the Levenshtein distance which gives a weight of 2 to replacements, which brings the Levenshtein distance up to 12. Then (8 + 8 - 12) / (8 + 8) = 0.25.