I tried using this to replace the NaN values in the column feature count ( its an integer that ranges from 1 to 10 ) using groupby ( client_id or client _ name ) , however the NaN values do not seem to go.
df['feature_count'].isnull().sum()
The output is :
2254
Now I use:
df['feature_count'].fillna(df.groupby('client_name')['feature_count'].mean(), inplace=True)
But the output remains the same :
df['feature_count'].isnull().sum()
2254
Any other way to replace the NaN values by the means of other non NaN values of the column grouped by their IDs?
df.groupby('client_name')['feature_count'].mean()
returns a series.
But you aren't looking to replace null values with a series. Instead, you want to replace null values with a mean mapped from a series.
Therefore, you can use the following:
s = df.groupby('client_name')['feature_count'].mean()
df['feature_count'].fillna(df['client_name'].map(s), inplace=True)
Even more Pandorable would be to utilize pd.DataFrame.transform
, which handles the mapping part for you:
s = df.groupby('client_name')['feature_count'].transform('mean')
df['feature_count'].fillna(s, inplace=True)