I'm currently doing some string product similarity matches between two different retailers and I'm using the fuzzywuzzy process.extractOne
function to find the best match.
However, I want to be able to set a scoring threshold so that the product will only match if the score is above a certain threshold, because currently it is just matching every single product based on the closest string.
The following code gives me the best match: (currently getting errors)
title, index, score = process.extractOne(text, choices_dict)
I then tried the following code to try set a threshold:
title, index, score = process.extractOne(text, choices_dict, score_cutoff=80)
Which results in the following TypeError:
TypeError: cannot unpack non-iterable NoneType object
Finally, I also tried the following code:
title, index, scorer, score = process.extractOne(text, choices_dict, scorer=fuzz.token_sort_ratio, score_cutoff=80)
Which results in the following error:
ValueError: not enough values to unpack (expected 4, got 3)
process.extractOne
will return None, when the best score is below score_cutoff
. So you either have to check for None, or catch the exception:
best_match = process.extractOne(text, choices_dict, score_cutoff=80)
if best_match:
value, score, key = best_match
print(f"best match is {key}:{value} with the similarity {score}")
else:
print("no match found")
or
try:
value, score, key = process.extractOne(text, choices_dict, score_cutoff=80)
print(f"best match is {key}:{value} with the similarity {score}")
except TypeError:
print("no match found")