Search code examples
pythonpandasmissing-dataweighted-average

Pandas: filling missing values by weighted average in each group


I have a dataFrame where 'value'column has missing values. I'd like to filling missing values by weighted average within each 'name' group. There was post on how to fill the missing values by simple average in each group but not weighted average. Thanks a lot!

df = pd.DataFrame({'value': [1, np.nan, 3, 2, 3, 1, 3, np.nan, np.nan],'weight':[3,1,1,2,1,2,2,1,1], 'name': ['A','A', 'A','B','B','B', 'C','C','C']})


   name  value  weight
0    A    1.0       3
1    A    NaN       1
2    A    3.0       1
3    B    2.0       2
4    B    3.0       1
5    B    1.0       2
6    C    3.0       2
7    C    NaN       1
8    C    NaN       1

I'd like to fill in "NaN" with weighted value in each "name" group, i.e.

   name  value  weight
0    A    1.0       3
1    A    1.5       1
2    A    3.0       1
3    B    2.0       2
4    B    3.0       1
5    B    1.0       2
6    C    3.0       2
7    C    3.0       1
8    C    3.0       1

Solution

  • You can group data frame by name, and use fillna method to fill the missing values with weighted average which can calculated with np.average with weights parameter:

    df['value'] = (df.groupby('name', group_keys=False)
                     .apply(lambda g: g.value.fillna(np.average(g.dropna().value, weights=g.dropna().weight))))
    
    df
    #name   value   weight
    #0  A    1.0    3
    #1  A    1.5    1
    #2  A    3.0    1
    #3  B    2.0    2
    #4  B    3.0    1
    #5  B    1.0    2
    #6  C    3.0    2
    #7  C    3.0    1
    #8  C    3.0    1
    

    To make this less convoluted, define a fillValue function:

    import numpy as np
    import pandas as pd
    
    def fillValue(g):
        gNotNull = g.dropna()
        wtAvg = np.average(gNotNull.value, weights=gNotNull.weight)
        return g.value.fillna(wtAvg)
    
    df['value'] = df.groupby('name', group_keys=False).apply(fillValue)