I need help in figuring out why
fuzz.WRatio('Māne', 'mane', force_ascii=True) => 75%
and also
fuzz.WRatio('Māne', 'Mane', force_ascii=True) => 75%
I would expect the force_ascii parameter to enforce more accuracy. Thank you.
There are two arguments force_ascii
and full_process
when working with fuzz.WRatio
in fuzzywuzzy, that are both True by default. They are both used for preprocessing the strings (force_ascii is only used when full_process is True aswell and otherwise ignored).
1) When using force_ascii=False, full_process=False
The strings are not changed before matching them so e.g. uppercase/lowercase matters.
2) When using force_ascii=False, full_process=True
All non alphanumeric characters in the strings are replaced with a whitespace, the strings are lowercased and whitespaces from beginning and end are trimmed. So for example
"Mäne!" -> "Mäne " -> "mäne " -> "mäne"
2) When using force_ascii=True, full_process=True
This does the same as 2) but removes all non ascii characters beforehand. So for example
"Mäne!" -> "Mne!" -> "Mne " -> "mne " -> "mne"
I do not really think that it is a good thing that force_ascii
defaults to true, since I personally do not really want this behaviour in 99% of the cases, but most people using fuzzywuzzy are not even aware of this behaviour.
Beside this it appears to have a bug, since e.g
> utils.full_process("ā", force_ascii=True)
'ā'
while it is clearly no ascii character and should therefore return an empty string.
In your case where you want it to consider any difference between the two strings you should call
> fuzz.WRatio('Māne', 'mane', full_process=False)
50
> fuzz.WRatio('Māne', 'Mane', full_process=False)
75