Search code examples
pandasaggregate

Pandas how to calculate sum for columns in list x and mean for columns in list y using aggregate


I have a dataframe with 60+ columns. I want to calculate sum for the column names starting with 'FLAG_' and mean for the column names starting with 'RESPONSE_' . How to achieve this? I tried using agg() with no luck.

Note: I'm trying to achieve this as part of data pipeline

sum_cols = [col for col in df.columns if 'FLAG_' in col]
mean_cols = [col for col in df.columns if 'RESPONSE_' in col]
dict_ = {tuple(sum_cols): 'sum', tuple(mean_cols): 'mean'}
df = df.groupby('ID').agg(dict_)

but I'm getting error because while selecting the columns it should be a list rather tuple. since it's a dictionary I cannot pass list as key.

KeyError: "Column(s) [array([#<All the FLAG_ columns listed>], dtype=object)] do not exist"

Solution

  •     sum_cols = [col for col in df.columns if 'FLAG_' in col]
        mean_cols = [col for col in df.columns if 'RESPONSE_' in col]
        dict_ = {col:'sum' for col in sum_cols}
        dict_.update({col:'mean' for col in mean_cols})
        df = df.groupby('ID').agg(dict_)