Search code examples
pythonpandasdataframecounter

Pandas groupby and count across multiple columns


I have data ordered by ID, Year, and then a series of event flags indicating whether a thing did or did not happen for that ID in that year:

ID Year x y z
1 2015 0 1 0
1 2016 1 1 0
1 2017 0 1 1
2 2015 1 0 1
2 2016 1 1 0
2 2017 0 1 1

I'd like to group by ID and Year and apply a cumulative count to each "event" column, such that I'm left with something like the following

ID Year x_total y_total z_total
1 2015 0 1 0
1 2016 1 2 0
1 2017 1 3 1
2 2015 1 0 1
2 2016 2 1 1
2 2017 2 2 2

I've looked at various options using cumsum and cumcount but I can't seem to figure this out.


Solution

  • You can use .groupby() + .cumsum() to get the cumulative count to each "event" column. Then add _total as suffix to the column names by .add_suffix() and then join with the first 2 columns:

    df[['ID', 'Year']].join(df.groupby('ID')[['x', 'y', 'z']].cumsum().add_suffix('_total'))
    

    Result:

       ID  Year  x_total  y_total  z_total
    0   1  2015        0        1        0
    1   1  2016        1        2        0
    2   1  2017        1        3        1
    3   2  2015        1        0        1
    4   2  2016        2        1        1
    5   2  2017        2        2        2