If I have two groups, 0 and 1 in a column labeled "Group Label", how can I impute the class mean for every other column based on that group, not based on the mean of the entire column
This is the code I have so far, which is splitting the DF into two groups but is not calculating the correct mean:
df1 = df.groupby("group_label").transform(lambda x: x.fillna(x.mean()))
Also, it seems to be dropping string columns such as my ID column.
Thanks in advance
You may use .transform('mean')
together with grouped DataFrame within .fillna()
, but you would need to specify columns you want to apply to:
# for a single column
df['col1'] = df['col1'].fillna(
df.groupby('Group_Label')['col1'].transform('mean'))
# for multiple columns
df[['col1', 'col2', ...]] = df[['col1', 'col2', ...]].fillna(
df.groupby('Group_Label')[['col1', 'col2', ...]].transform('mean'))