Search code examples
pythonpandasfuzzy-comparison

Fuzzy scoring top N in Python 3?


I am trying to build a dataframe of word and fuzzywuzzy score, and take top 5.

For example I have test word test = "kuku"

My bag of words are:

words = ["tutu", "pupu", "lulu", "kuko", "dfvfd", "wwwer"]

I have done the following so far:

import os
import pandas as pd
from fuzzywuzzy import fuzz

test = "kuku"
[print(i, fuzz.ratio(i, test)) for i in words]

But I want to be able to sort and take top N by score.

Please advise what is the best practice to solve such issue?


Solution

  • process.extract() of Fuzzywuzzy returns the list of words in reverse sorted order. The first match will be the best one.

    from fuzzywuzzy import fuzz 
    from fuzzywuzzy import process 
    
    query = 'sat'
    choices = ['slate', 'saturn', 'satellite', 'sat', 'shore']  
    print(f"top 3: {process.extract(query, choices)[:3]}")