I'm trying to create a function that will impute values based on a choice either mean or median.
I've managed to do so, my problem is that I want to round off only the values I impute. But the way I've done it is rounding off every value in the column, not only the filled values as required.
def conditional_impute(input_df, choice='median'):
new_df = input_df.copy()
if choice == 'median':
new_df['Age'] = round(new_df.groupby(['Sex', 'Pclass'])['Age'].transform(func = lambda x: x.fillna(x.median())),1)
elif choice == 'mean':
new_df['Age'] = round(new_df.groupby(['Sex', 'Pclass'])['Age'].transform(func = lambda x: x.fillna(x.mean())),1)
else:
raise ValueError('Please choose either median or mean as your impute choice.')
return new_df
So how can I round off only the imputed values?
You apply the round function to the whole column. Did you try something like this and apply it to the median (or mean) only.
if choice == 'median':
new_df['Age'] = new_df.groupby(['Sex', 'Pclass'])['Age'].transform(func = lambda x: x.fillna(round(x.median(),1)))