suppose i have a sample code like this
_d=pd.DataFrame([[1,2,3],[4,np.nan,6],[np.nan,np.nan,8]],columns=['x','y','z'])
now, i have a function which checks for the value and assigns a desired value according to the scenerio
def handling_nan(_d):
if _d['x']==1.0:
return 100
else:
return _d
when i use this, in my below code,
_result=_d.apply(lambda x:handling_nan(x))
_result
i am getting error
KeyError: ('x', 'occurred at index x')
UPDATE A :
well, in short, i am using the dataset from kaggle.com ie. Titanic: Machine Learning from Disaster and in that dataset, i want to introduce a new column with condition something like this.
if male and the age is NaN then insert the mean() age of men instead of NaN and if female and the age is NaN, then insert the mean() of the total female age instead of NaN
KeyError
is encountered in the function since apply()
method on a dataframe assumes axis=0
. This means that the function will be applied on every column and not every row. To remove this error, the apply()
call needs to be replaced as:
_result=_d.apply(lambda x:handling_nan(x), axis=1)
Looking at the edit, the question is to replace NaNs
with grouped means in the dataset.
This can be done using fillna()
and transform()
method as following:
l = [["M", 30], ["M", 45], ["M", None], ["F", 76], ["F", 23], ["F", None]]
df = pd.DataFrame(l, columns=["sex", "age"])
df['age'] = df['age'].fillna(df.groupby("sex")['age'].transform('mean'))
This answer has other alternative solutions.
Hope this helps.