Search code examples
pythonpandasdataframegroup-byaggregate

Groupby and Aggregate 2 columns in pandas


I would like to summarize by dataset based on two columns and get both the 'sum' and 'count' of these columns. I know how to get the 'sum' and 'count' of the columns separately, for example:

summary = df.groupby(['country','item', "date", 'msgtype'], as_index=False)['size', 'turnover'].sum()

But I want to be able to do it in one line if possible, without having to do merges etc.

I tried to do this:

summary = df.groupby(['country','item', "date", 'order'], as_index=False).agg({(['size', 'weight']) : ['sum' , 'count']})

But I get the error:

unhashable type: 'list'

So I'm not too sure what to do.


Solution

  • Try this.

    # aggregate size and weight by country, item, date and order
    summary = df.groupby(['country','item', "date", 'order'])[['size', 'weight']].agg(['sum' , 'count'])
    # flatten column names
    summary.columns = [f"{x}_{y}" for x,y in summary.columns]
    # make groupers columns
    summary = summary.reset_index()
    

    It groups by country, item, date and order and aggregates size and weight columns by their respective sum and counts.