Say I have the following data:
| student | performance | competititon | month |
|--------------|-------------|--------------|---------|
| Aardvark | 11 | A | 2019-01 |
| Aardvark | 13 | B | 2019-01 |
| Jason Derulo | 11 | A | 2019-01 |
| Jason Derulo | 15 | B | 2019-01 |
| Jason Derulo | 22 | C | 2019-02 |
| JP Saxe | 11 | A | 2019-01 |
| JP Saxe | 23 | B | 2019-01 |
| JP Saxe | 32 | C | 2019-02 |
I want the following result:
| student | aggregate performance | month |
|--------------|-----------------------|---------|
| Aardvark | 24 | 2019-01 |
| Jason Derulo | 26 | 2019-01 |
| Jason Derulo | 22 | 2019-02 |
| JP Saxe | 34 | 2019-01 |
| JP Saxe | 32 | 2019-02 |
In short, I want the sum of the performance column, irregardless of competition, for each month, into a new list.
I've tried things like where
and groupby
and sum
but I can't seem to quite get what I want.
Let us do
df.groupby(['month', 'student'])[['performance']].sum().reset_index()