Search code examples
pythonpandasdataframemeanpandas-groupby

pandas groupby mean with nan


I have the following dataframe:

date id  cars
2012 1    4  
2013 1    6
2014 1    NaN    
2012 2    10 
2013 2    20 
2014 2    NaN  

Now, I want to get the mean of cars over the years for each id ignoring the NaN's. The result should be like this:

date id  cars  result
2012 1    4      5
2013 1    6      5
2014 1    NaN    5
2012 2    10     15
2013 2    20     15
2014 2    NaN    15

I have the following command:

df["result"]=df.groupby("id")["cars"].mean()

The command runs without errors, but the result column only has NaN's. What did I do wrong?


Solution

  • Use transform, this returns a series the same size as the original:

    df["result"]=df.groupby("id")["cars"].transform('mean')
    print (df)
       date  id  cars  result
    0  2012   1   4.0     5.0
    1  2013   1   6.0     5.0
    2  2014   1   NaN     5.0
    3  2012   2  10.0    15.0
    4  2013   2  20.0    15.0
    5  2014   2   NaN    15.0