Search code examples
pythonpandasdataframerowcolumnsorting

Pandas: sum all rows with the same month


Say I have the following data:

| student      | performance | competititon | month   |
|--------------|-------------|--------------|---------|
| Aardvark     | 11          | A            | 2019-01 |
| Aardvark     | 13          | B            | 2019-01 |
| Jason Derulo | 11          | A            | 2019-01 |
| Jason Derulo | 15          | B            | 2019-01 |
| Jason Derulo | 22          | C            | 2019-02 |
| JP Saxe      | 11          | A            | 2019-01 |
| JP Saxe      | 23          | B            | 2019-01 |
| JP Saxe      | 32          | C            | 2019-02 |

I want the following result:

| student      | aggregate performance | month   |
|--------------|-----------------------|---------|
| Aardvark     | 24                    | 2019-01 |
| Jason Derulo | 26                    | 2019-01 |
| Jason Derulo | 22                    | 2019-02 |
| JP Saxe      | 34                    | 2019-01 |
| JP Saxe      | 32                    | 2019-02 |

In short, I want the sum of the performance column, irregardless of competition, for each month, into a new list.

I've tried things like where and groupby and sum but I can't seem to quite get what I want.


Solution

  • Let us do

    df.groupby(['month', 'student'])[['performance']].sum().reset_index()