Search code examples
pythonpandasmeanimputation

How Can I impute every column in a dataframe with its respective class mean?


If I have two groups, 0 and 1 in a column labeled "Group Label", how can I impute the class mean for every other column based on that group, not based on the mean of the entire column

This is the code I have so far, which is splitting the DF into two groups but is not calculating the correct mean:

df1 = df.groupby("group_label").transform(lambda x: x.fillna(x.mean()))

Also, it seems to be dropping string columns such as my ID column.

Thanks in advance


Solution

  • You may use .transform('mean') together with grouped DataFrame within .fillna(), but you would need to specify columns you want to apply to:

    # for a single column
    df['col1'] = df['col1'].fillna(
        df.groupby('Group_Label')['col1'].transform('mean'))
    
    # for multiple columns
    df[['col1', 'col2', ...]] = df[['col1', 'col2', ...]].fillna(
        df.groupby('Group_Label')[['col1', 'col2', ...]].transform('mean'))