Search code examples
pythonpandasdataframefuzzywuzzyfuzzy-comparison

why do i get a key error from output when i do a merge


hi please help me I am trying to fuzzy merge using pandas and fuzzywuzzy on two datasets using two columns from each, but I get a traceback at the line before the print function that says KeyError: ('name', 'lasntname'), I do not know if I am referencing wrong or what, I have tried the double brackets and parenthesis no luck

heres the code

import pandas as pd
from fuzzywuzzy import fuzz, process
from itertools import product

N = 80
names = {tup: fuzz.ratio(*tup) for tup in
     product(df1["Name"].tolist(), 
     df2["name"].tolist())}

     s1 = pd.Series(names)
     s1 = s1[s1 > N]
     s1 = s1[s1.groupby(level=0).idxmax()]

     surnames = {tup: fuzz.ratio(*tup) for tup in
        product(df1["Last_name"].tolist(), 
     df2["lasntname"].tolist())}

     s2 = pd.Series(surnames)
     s2 = s2[s2 > N]
     s2 = s2[s2.groupby(level=0).idxmax()]

     # map and fill nulls

     df2["name"] = 
     df2["name"].map(s1).fillna(df2["name"])
     df2["lasntname"] = 
     df2["lasntname"].map(s2).fillna(df2["lasntname"])

     df = df1.merge(df2, on=["name", "lasntname"], 
     how='outer')
     print(df)

Solution

  • Hi Just make your Column names uniform on both tables should work