django python-3.x fuzzy-logic fuzzywuzzy

Error with FuzzyWuzzy: StringProcessor.replace_non_letters_non_numbers_with_whitespace(s)

I cannot get the following function to run:

match, match_score = process.extractOne(score, pct_dict.keys())

I get a whitespace error I cannot seem to resolve. Any idea what is causing this?

What it should do: If the score is 15 it should return 0.026

Error:

Error: output = self.func(*resolved_args, **resolved_kwargs) wnas1
| File "/code/cleveland/templatetags/percentiles_ratings.py", line 32, in get_percentile_standard wnas1 | match, match_score = process.extractOne(score, pct_dict.keys()) wnas1 | File "/usr/local/lib/python3.7/site-packages/fuzzywuzzy/process.py", line 220, in extractOne wnas1 | return max(best_list, key=lambda i: i[1]) wnas1 | File "/usr/local/lib/python3.7/site-packages/fuzzywuzzy/process.py", line 78, in extractWithoutOrder wnas1 | processed_query = processor(query) wnas1 | File "/usr/local/lib/python3.7/site-packages/fuzzywuzzy/utils.py", line 95, in full_process wnas1 | string_out = StringProcessor.replace_non_letters_non_numbers_with_whitespace(s) wnas1 | File "/usr/local/lib/python3.7/site-packages/fuzzywuzzy/string_processing.py", line 26, in replace_non_letters_non_numbers_with_whitespace wnas1
| return cls.regex.sub(" ", a_string)

Code:

from __future__ import unicode_literals
from django import template
from fuzzywuzzy import fuzz
from fuzzywuzzy import process


register = template.Library()


@register.simple_tag
def get_perc(score):
    MATCH_THRESHOLD = 80
    pct_dict = {14: 0.016, 14.7: 0.021, 15.3: 0.026, 16: 0.034, 16.7: 0.04, 17.3: 0.05, 18: 0.07, 18.7: 0.09,
                    19.3: 0.11, 20: 0.13, 20.7: 0.17, 21.3: 0.21, 22: 0.26, 22.7: 0.31, 23.3: 0.38, 24: 0.47}
    if not score:
        return '--'
    elif score < 26.7:
        return '<1'

    match, match_score = process.extractOne(score, pct_dict.keys())

    if match_score >= MATCH_THRESHOLD:
        return pct_dict[match]
    else:
        return '--'

Solution

As per fuzzywuzzy documentation, you need to compare between two strings. Meaning you need to convert you values in string to compare them. Then you need to do it like this:

match, match_score = process.extractOne(str(score), pct_dict.keys())

I would not recommend this approach because that will not be accurate.

>>> x = ['1','2','3']
>>> y='2'
>>> process.extractOne(y,x)
('2', 100)
>>> y='2.2'
>>> process.extractOne(y,x)
('2', 90)
>>> y = '2.9'
>>> process.extractOne(y,x)
('2', 90)

Here in last 2 entries, you will see score 90 for both 2.2 and 2.9, where 2.9 is much closer to 3.

As you have numbers and I would recommend you to do simply compare them like this:

value = min(pct_dict, key=lambda x:abs(x - score))
# then some logics to see if value is close to score or put some static threshold value like `abs(value-score) < .3`

There are few SO answers which might help you regarding this.