I am trying to build a dataframe
of word and fuzzywuzzy
score, and take top 5.
For example I have test word test = "kuku"
My bag of words are:
words = ["tutu", "pupu", "lulu", "kuko", "dfvfd", "wwwer"]
I have done the following so far:
import os
import pandas as pd
from fuzzywuzzy import fuzz
test = "kuku"
[print(i, fuzz.ratio(i, test)) for i in words]
But I want to be able to sort and take top N by score.
Please advise what is the best practice to solve such issue?
process.extract() of Fuzzywuzzy returns the list of words in reverse sorted order. The first match will be the best one.
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
query = 'sat'
choices = ['slate', 'saturn', 'satellite', 'sat', 'shore']
print(f"top 3: {process.extract(query, choices)[:3]}")