Search code examples
djangopython-3.xfuzzy-logicfuzzywuzzy

Error with FuzzyWuzzy: StringProcessor.replace_non_letters_non_numbers_with_whitespace(s)


I cannot get the following function to run:

match, match_score = process.extractOne(score, pct_dict.keys())

I get a whitespace error I cannot seem to resolve. Any idea what is causing this?

What it should do: If the score is 15 it should return 0.026

Error:

Error: output = self.func(*resolved_args, **resolved_kwargs) wnas1
| File "/code/cleveland/templatetags/percentiles_ratings.py", line 32, in get_percentile_standard wnas1 | match, match_score = process.extractOne(score, pct_dict.keys()) wnas1 | File "/usr/local/lib/python3.7/site-packages/fuzzywuzzy/process.py", line 220, in extractOne wnas1 | return max(best_list, key=lambda i: i[1]) wnas1 | File "/usr/local/lib/python3.7/site-packages/fuzzywuzzy/process.py", line 78, in extractWithoutOrder wnas1 | processed_query = processor(query) wnas1 | File "/usr/local/lib/python3.7/site-packages/fuzzywuzzy/utils.py", line 95, in full_process wnas1 | string_out = StringProcessor.replace_non_letters_non_numbers_with_whitespace(s) wnas1 | File "/usr/local/lib/python3.7/site-packages/fuzzywuzzy/string_processing.py", line 26, in replace_non_letters_non_numbers_with_whitespace wnas1
| return cls.regex.sub(" ", a_string)

Code:

from __future__ import unicode_literals
from django import template
from fuzzywuzzy import fuzz
from fuzzywuzzy import process


register = template.Library()


@register.simple_tag
def get_perc(score):
    MATCH_THRESHOLD = 80
    pct_dict = {14: 0.016, 14.7: 0.021, 15.3: 0.026, 16: 0.034, 16.7: 0.04, 17.3: 0.05, 18: 0.07, 18.7: 0.09,
                    19.3: 0.11, 20: 0.13, 20.7: 0.17, 21.3: 0.21, 22: 0.26, 22.7: 0.31, 23.3: 0.38, 24: 0.47}
    if not score:
        return '--'
    elif score < 26.7:
        return '<1'

    match, match_score = process.extractOne(score, pct_dict.keys())

    if match_score >= MATCH_THRESHOLD:
        return pct_dict[match]
    else:
        return '--'

Solution

  • As per fuzzywuzzy documentation, you need to compare between two strings. Meaning you need to convert you values in string to compare them. Then you need to do it like this:

    match, match_score = process.extractOne(str(score), pct_dict.keys())
    

    I would not recommend this approach because that will not be accurate.

    >>> x = ['1','2','3']
    >>> y='2'
    >>> process.extractOne(y,x)
    ('2', 100)
    >>> y='2.2'
    >>> process.extractOne(y,x)
    ('2', 90)
    >>> y = '2.9'
    >>> process.extractOne(y,x)
    ('2', 90)
    

    Here in last 2 entries, you will see score 90 for both 2.2 and 2.9, where 2.9 is much closer to 3.

    As you have numbers and I would recommend you to do simply compare them like this:

    value = min(pct_dict, key=lambda x:abs(x - score))
    # then some logics to see if value is close to score or put some static threshold value like `abs(value-score) < .3`
    

    There are few SO answers which might help you regarding this.