I have a dataframe with 60+ columns. I want to calculate sum for the column names starting with 'FLAG_' and mean for the column names starting with 'RESPONSE_' . How to achieve this? I tried using agg() with no luck.
Note: I'm trying to achieve this as part of data pipeline
sum_cols = [col for col in df.columns if 'FLAG_' in col]
mean_cols = [col for col in df.columns if 'RESPONSE_' in col]
dict_ = {tuple(sum_cols): 'sum', tuple(mean_cols): 'mean'}
df = df.groupby('ID').agg(dict_)
but I'm getting error because while selecting the columns it should be a list rather tuple. since it's a dictionary I cannot pass list as key.
KeyError: "Column(s) [array([#<All the FLAG_ columns listed>], dtype=object)] do not exist"
sum_cols = [col for col in df.columns if 'FLAG_' in col]
mean_cols = [col for col in df.columns if 'RESPONSE_' in col]
dict_ = {col:'sum' for col in sum_cols}
dict_.update({col:'mean' for col in mean_cols})
df = df.groupby('ID').agg(dict_)