I am working on the titanic datset from Kaggle and am trying to replace the NaN values in one column based on information from the other columns.
In my specific example I am trying to replace the unknown age of male, 1st class passengers with the average age of male, 1st class passengers.
How do I do this?
I have been able to segment the data and replace the null values of that new dataframe, but it doesn't carry over to the original dataframe and I am a bit unclear on how to make it do so.
Here is my code:
missingage_1stclass_male = pd.DataFrame(
titanic[
(titanic['Age'].isnull()) &
(titanic['Pclass'] == 1) &
(titanic['Sex'] == 'male')
]
)
missingage_1stclass_male.Age.fillna(40.5, inplace=True)
My original dataframe with all the values is named titanic.
I am trying to replace the unknown age of male, 1st class passengers with the average age of male, 1st class passengers.
You can split the problem into 2 steps. First calculate the average age of male, 1st class passengers:
mask = (df['Pclass'] == 1) & (df['Sex'] == 'male')
avg_filler = df.loc[mask, 'Age'].mean()
Then update values satisfying your criteria:
df.loc[df['Age'].isnull() & mask, 'Age'] = avg_filler