Search code examples
pandaspython-3.7

Replace selected values of one column with median value of another column but with condition


enter image description here

So, I hope you know the famous Titanic question. This is what I did so far by learning the tutorial. Now I want to replace NaN values of column: Age with median values of part of Age column. But the selected part should have a certain value for "Title"

For example, I want to replace NaN of Age where Title="Mr", so the median value for "Mr" would be filled in missing places which also has Title=="Mr".

I tried this:

for val in data["Title"].unique():
    median_age = data.loc[data.Title == val, "Age"].median()
    data.loc[data.Title == val, "Age"].fillna(median_age, inplace=True)

But still Age shows up as NaN. How can I do this?


Solution

  • Use combine_first to fill NaN. I have no column Title from my dataset but it's the same:

    df['Age'] = df['Age'].combine_first(df.groupby('Sex')['Age'].transform('median'))