Search code examples
pythonpandasdataframegroup-bypandas-apply

How do I sum a column based on separate category types, and preserve the zeros?


Here is my dataframe:

Id Category Hours
1 A 1
1 A 3
1 B 4
2 A 2
2 B 6
3 A 3

And here is the output I want:

Id Total Hours A_Hours B_Hours
1 5 4 4
2 8 2 6
3 3 3 0

How do I achieve this?

I tried various methods of grouping and aggregation, even calculating the A_hours series separately and appending it to the dataframe but I didn't find a way to both calculate the zeros ( based on absence of hours under a particular category) and maintain the order.


Solution

  • Use a pivot_table:

    out = (df.pivot_table(index='Id', columns='Category', values='Hours',
                          aggfunc='sum', fill_value=0,
                          margins=True, margins_name='Total')
             .add_suffix('_Hours')
             .drop('Total').reset_index().rename_axis(columns=None)
          )
    

    Output:

       Id  A_Hours  B_Hours  Total_Hours
    0   1        4        4            8
    1   2        2        6            8
    2   3        3        0            3