Search code examples
pythonpandasdataframefillna

How to fill nan values of each column in pandas with respect to the average of each class in that column


I have a dataset in pandas (say two class).

 index | length | weight | label 
-------|--------|--------|-------
   0       1         2       0
   1       2         3       0
   2      nan        4       0
   3       6        nan      0
   4       30        40      1
   5       45        35      1
   6       18       nan      1

df.fillna(df.mean()) returns a dataframe which each nan is filled by mean of each column. But I want to fill each nan in each column with mean of its class so length at index 2 would be 3. Output is like this:

 index | length | weight | label 
-------|--------|--------|-------
   0       1         2       0
   1       2         3       0
   2       3         4       0
   3       6         3       0
   4       30        40      1
   5       45        35      1
   6       18       37.5     1

Is there a simple function or I should implement it myself?


Solution

  • Use GroupBy.transform with mean for helper Dataframe with means per groups and pass to fillna:

    df = df.fillna(df.groupby('label').transform('mean')) 
    print (df)
       length  weight  label
    0     1.0     2.0      0
    1     2.0     3.0      0
    2     3.0     4.0      0
    3     6.0     3.0      0
    4    30.0    40.0      1
    5    45.0    35.0      1
    6    18.0    37.5      1 
    

    Detail:

    print (df.groupby('label').transform('mean'))
       length  weight
    0     3.0     3.0
    1     3.0     3.0
    2     3.0     3.0
    3     3.0     3.0
    4    31.0    37.5
    5    31.0    37.5
    6    31.0    37.5