Search code examples
pythonnlpstring-matchingfuzzywuzzy

Why doesn't fuzzywuzzy's process.extractBests give a 100% score when the tested string 100% contains the query string?


I'm testing fuzzywuzzy's process.extractBests() as follows:

from fuzzywuzzy import process

# Define the query string
query = "Apple"

# Define the list of choices
choices = ["Apple", "Apple Inc.", "Apple Computer", "Apple Records", "Apple TV"]

# Call the process.extractBests function
results = process.extractBests(query, choices)

# Print the results
for result in results:
    print(result)

It outputs:

('Apple', 100)
('Apple Inc.', 90)
('Apple Computer', 90)
('Apple Records', 90)
('Apple TV', 90)

Why didn't the scorer give 100 to all strings since they all 100% contain the query string ("Apple")?

I use fuzzywuzzy==0.18.0 with Python 3.11.7.


Solution

  • The fuzzywuzzy's extractBests() function does not give 100% because it does not check for a match, it checks for similarity, such as length of string, contents of string compared to the query, positions of the query string, and a few other factors. In your case, it does not output 100% because "Apple Inc." is not an exact match of your query, "Apple". This is why only the "Apple" choice outputs 100%, because it 100% matches with the query, "Apple". I hoped this helped!