Search code examples
pandasmatchpython-3.6difflib

Python 3.6 Pandas Difflib Get_Close_Matches to filter a dataframe with user input


Using a csv imported using a pandas dataframe, I am trying to search one column of the df for entries similar to a user generated input. Never used difflib before and my tries have ended in a TypeError: object of type 'float' has no len() or an empty [] list.

import difflib
import pandas as pd

df = pd.read_csv("Vendorlist.csv", encoding= "ISO-8859-1")
word = input ("Enter a vendor: ")

def find_it(w):
    w = w.lower()
    return difflib.get_close_matches(w, df.vendorname, n=50, cutoff=.6)

alternatives = find_it(word)
print (alternatives)

The error seems to occur at "return.difflib.get_close_matches(w, df.vendorname, n=50, cutoff=.6)"

Am attempting to get similar results to "word" with a column called 'vendorname'.

Help is greatly appreciated.


Solution

  • Your column vendorname is of the incorrect type.

    Try in your return statement:

    return difflib.get_close_matches(w, df.vendorname.astype(str), n=50, cutoff=.6)
    

    import difflib
    import pandas as pd
    
    df = pd.read_csv("Vendorlist.csv", encoding= "ISO-8859-1")
    word = input ("Enter a vendor: ")
    
    def find_it(w):
        w = w.lower()
        return difflib.get_close_matches(w, df.vendorname.astype(str), n=50, cutoff=.6)
    
    alternatives = find_it(word)
    print (alternatives)
    

    As stated in the comments by @johnchase

    The question also mentions the return of an empty list. The return of get_close_matches is a list of matches, if no item matched within the cutoff an empty list will be returned – johnchase