I am writing a function to aid DataFrame merges between two tables. The function creates a mapping key in the first DataFrame using variables in the second DataFrame.
My issue arises when I try to include the .fillna(method=) at the end of the function.
# Import libraries
import pandas as pd
# Create data
data_1 = {"col_1": [1, 2, 3, 4, 5], "col_2": [1, , 3, , 5]}
data_2 = {"col_1": [1, 2, 3, 4, 5], "col_3": [1, , 3, , 5]}
df = pd.DataFrame(data_1)
df2 = pd.DataFrame(data_2)
def merge_on_key(df, df2, join_how="left", fill_na=None):
# Import libraries
import pandas as pd
# Code to create mapping key not required for question
# Merge the two dataframes
print(fill_na)
print(type(fill_na))
df3 = pd.merge(df, df1, how=join_how, on="col_1").fillna(method=fill_na)
return df3
df3 = merge_on_key(df, df2)
output:
>>> None
>>> <class 'NoneType'>
error message:
ValueError: Must specify a fill 'value' or 'method'
My question is why does the fill_na, which is equal to None, not allow the fillna(method=None, the default value for fillna(method))?
You have to either use a 'value' or a 'method'. In your call to fillna
you are setting both of them to None
. In short, you're telling Python to fill empty (None
) values in the dataframe with None
, which does nothing and thus it raises an exception.
Based on the docs (link), you could either assign a non-empty value:
df3 = pd.merge(df, df1, how=join_how, on="col_1").fillna(value=0, method=fill_na)
or change the method from None
(which means "directly substitute the None
values in the dataframe by the given value
) to one of {'backfill', 'bfill', 'pad', 'ffill'}
(each documented in the docs):
df3 = pd.merge(df, df1, how=join_how, on="col_1").fillna( method='backfill')