Here is my dataframe:
Id | Category | Hours |
---|---|---|
1 | A | 1 |
1 | A | 3 |
1 | B | 4 |
2 | A | 2 |
2 | B | 6 |
3 | A | 3 |
And here is the output I want:
Id | Total Hours | A_Hours | B_Hours |
---|---|---|---|
1 | 5 | 4 | 4 |
2 | 8 | 2 | 6 |
3 | 3 | 3 | 0 |
How do I achieve this?
I tried various methods of grouping and aggregation, even calculating the A_hours series separately and appending it to the dataframe but I didn't find a way to both calculate the zeros ( based on absence of hours under a particular category) and maintain the order.
Use a pivot_table
:
out = (df.pivot_table(index='Id', columns='Category', values='Hours',
aggfunc='sum', fill_value=0,
margins=True, margins_name='Total')
.add_suffix('_Hours')
.drop('Total').reset_index().rename_axis(columns=None)
)
Output:
Id A_Hours B_Hours Total_Hours
0 1 4 4 8
1 2 2 6 8
2 3 3 0 3