Search code examples
pythonpandasdataframeconcatenationpandas-groupby

How to efficiently aggregate by segments in Python


I'm looking to find an alternate way to group "total" groupings. Below is over simplified example of what I'm currently doing to create a total grouping. Currently, I'm creating a separate "total" data frame and merging it with the original one. Is there a better way to approach this problem?

import pandas as pd
df = pd.DataFrame({'new_ren':['new', 'ren', 'new', 'new', 'ren'],
                   'dt':['2022-01-01', '2022-01-01', '2022-02-01', '2022-02-01', '2022-03-01'],
                   'ct':[1, 3, 4, 1, 2]})

# create total grouping
df_tot = df.groupby('dt')['ct'].sum().reset_index()
df_tot['new_ren'] = 'total'

# combined both dataframes
pd.concat([df, df_tot])


Solution

  • There's nothing wrong with your approach (I'd say, it's the most intuitive and readable for your current problem) but here's an alternative way using merge to get the same output:

    out = df.merge(df.groupby('dt', as_index=False).sum('ct').assign(new_ren='total'), how='outer')
    

    Output:

      new_ren          dt  ct
    0     new  2022-01-01   1
    1     ren  2022-01-01   3
    2     new  2022-02-01   4
    3     new  2022-02-01   1
    4     ren  2022-03-01   2
    5   total  2022-01-01   4
    6   total  2022-02-01   5
    7   total  2022-03-01   2