Search code examples
pythonpandasgroup-bypandas-groupbyfillna

How does pandas replace NaN values with mean value using groupby


I tried using this to replace the NaN values in the column feature count ( its an integer that ranges from 1 to 10 ) using groupby ( client_id or client _ name ) , however the NaN values do not seem to go.

df['feature_count'].isnull().sum()

The output is :

2254

Now I use:

df['feature_count'].fillna(df.groupby('client_name')['feature_count'].mean(), inplace=True)

But the output remains the same :

df['feature_count'].isnull().sum()

2254

Any other way to replace the NaN values by the means of other non NaN values of the column grouped by their IDs?


Solution

  • df.groupby('client_name')['feature_count'].mean() returns a series.

    But you aren't looking to replace null values with a series. Instead, you want to replace null values with a mean mapped from a series.

    Therefore, you can use the following:

    s = df.groupby('client_name')['feature_count'].mean()
    df['feature_count'].fillna(df['client_name'].map(s), inplace=True)
    

    Even more Pandorable would be to utilize pd.DataFrame.transform, which handles the mapping part for you:

    s = df.groupby('client_name')['feature_count'].transform('mean')
    df['feature_count'].fillna(s, inplace=True)