Search code examples
pythonpandasgroup-bynanisnullorempty

Conditional NaN filling not changing column or making all None


I have a df with a column, Critic_Score, that has NaN values. I am trying to replace them with the average of the Critic Scores from the same platform. This question has been asked on stack overflow several times and I used 4 suggestions that did not give me the desired output. Please tell me how to fix this.

This is a subset of the df:

x[['Platform','Critic_Score']].head()

Platform    Critic_Score
0   wii 76.0
1   nes NaN
2   wii 82.0
3   wii 80.0
4   gb  NaN

More information on the original df:

x.head().to_dict('list')
{'Name': ['wii sports',
  'super mario bros.',
  'mario kart wii',
  'wii sports resort',
  'pokemon red/pokemon blue'],
 'Platform': ['wii', 'nes', 'wii', 'wii', 'gb'],
 'Year_of_Release': [2006.0, 1985.0, 2008.0, 2009.0, 1996.0],
 'Genre': ['sports', 'platform', 'racing', 'sports', 'role-playing'],
 'NA_sales': [41.36, 29.08, 15.68, 15.61, 11.27],
 'EU_sales': [28.96, 3.58, 12.76, 10.93, 8.89],
 'JP_sales': [3.77, 6.81, 3.79, 3.28, 10.22],
 'Other_sales': [8.45, 0.77, 3.29, 2.95, 1.0],
 'Critic_Score': [76.0, nan, 82.0, 80.0, nan],
 'User_Score': ['8', nan, '8.3', '8', nan],
 'Rating': ['E', nan, 'E', 'E', nan]}

These are the statements I tried followed by their output:

1.

x['Critic_Score'] = x['Critic_Score'].fillna(x.groupby('Platform')['Critic_Score'].transform('mean'), inplace = True)

0    None
1    None
2    None
3    None
4    None
Name: Critic_Score, dtype: object
x.loc[x.Critic_Score.isnull(), 'Critic_Score'] = x.groupby('Platform').Critic_Score.transform('mean')
#no change in column
0    76.0
1     NaN
2    82.0
3    80.0
4     NaN
x['Critic_Score'] = x.groupby('Platform')['Critic_Score']\
    .transform(lambda y: y.fillna(y.mean()))
#no change in column
0    76.0
1     NaN
2    82.0
3    80.0
4     NaN
Name: Critic_Score, dtype: float64
x['Critic_Score']=x.groupby('Platform')['Critic_Score'].apply(lambda y:y.fillna(y.mean()))
​
x['Critic_Score'].head()
​

Out[73]:
0    76.0
1     NaN
2    82.0
3    80.0
4     NaN
Name: Critic_Score, dtype: float64

Solution

  • x.update(
        x.groupby('Platform').Critic_Score.transform('mean'),
        overwrite=False)
    
    • First you create a new df with the same number of rows but with the platform average on every row.

    • Then use that to update the original

    Bear in mind your sample has only one row of nes and another of gb, both with nan score, so there is nothing to be averaged