I have a df with a column, Critic_Score, that has NaN values. I am trying to replace them with the average of the Critic Scores from the same platform. This question has been asked on stack overflow several times and I used 4 suggestions that did not give me the desired output. Please tell me how to fix this.
This is a subset of the df:
x[['Platform','Critic_Score']].head()
Platform Critic_Score
0 wii 76.0
1 nes NaN
2 wii 82.0
3 wii 80.0
4 gb NaN
More information on the original df:
x.head().to_dict('list')
{'Name': ['wii sports',
'super mario bros.',
'mario kart wii',
'wii sports resort',
'pokemon red/pokemon blue'],
'Platform': ['wii', 'nes', 'wii', 'wii', 'gb'],
'Year_of_Release': [2006.0, 1985.0, 2008.0, 2009.0, 1996.0],
'Genre': ['sports', 'platform', 'racing', 'sports', 'role-playing'],
'NA_sales': [41.36, 29.08, 15.68, 15.61, 11.27],
'EU_sales': [28.96, 3.58, 12.76, 10.93, 8.89],
'JP_sales': [3.77, 6.81, 3.79, 3.28, 10.22],
'Other_sales': [8.45, 0.77, 3.29, 2.95, 1.0],
'Critic_Score': [76.0, nan, 82.0, 80.0, nan],
'User_Score': ['8', nan, '8.3', '8', nan],
'Rating': ['E', nan, 'E', 'E', nan]}
These are the statements I tried followed by their output:
1.
x['Critic_Score'] = x['Critic_Score'].fillna(x.groupby('Platform')['Critic_Score'].transform('mean'), inplace = True)
0 None
1 None
2 None
3 None
4 None
Name: Critic_Score, dtype: object
x.loc[x.Critic_Score.isnull(), 'Critic_Score'] = x.groupby('Platform').Critic_Score.transform('mean')
#no change in column
0 76.0
1 NaN
2 82.0
3 80.0
4 NaN
x['Critic_Score'] = x.groupby('Platform')['Critic_Score']\
.transform(lambda y: y.fillna(y.mean()))
#no change in column
0 76.0
1 NaN
2 82.0
3 80.0
4 NaN
Name: Critic_Score, dtype: float64
x['Critic_Score']=x.groupby('Platform')['Critic_Score'].apply(lambda y:y.fillna(y.mean()))
x['Critic_Score'].head()
Out[73]:
0 76.0
1 NaN
2 82.0
3 80.0
4 NaN
Name: Critic_Score, dtype: float64
x.update(
x.groupby('Platform').Critic_Score.transform('mean'),
overwrite=False)
First you create a new df with the same number of rows but with the platform average on every row.
Then use that to update the original
Bear in mind your sample has only one row of nes
and another of gb
, both with nan
score, so there is nothing to be averaged