Search code examples
pythonpandasdata-analysis

Finding the sum of grouped data by column


My grouped data looks like:

deviceid                                  time          total_sent
022009f075929be71975ce70db19cd47780b112f  1980-January  36            4
                                                        52            1
                                                        94            1
                                                        211           1
                                                        278           1
                                                        318           2
                                                        370           1
                                                        426           1
                                                        430           1
                                                        435           1
                                                        560           1
                                                        674           1
                                                        797           1
                                                        813           4
                                                        816           1

ff5b22df4ab9207bb6709cddef6d95c655565578  2013-August  11308408       4
                                                       12075616       1
                                                       17933654       1
                                                       22754808      12
                                                       22754987       1
                                                       22755166       3
                                                       22755345       4
                                                       22788586       4
                                                       22788765       2
                                                       22788944       2
                                                       22791830       1
                                                       22792546       1
                                                       22796843       1
                                                       22797201       2
                                                       22797380       2

Where the last column represents the count. I obtained this grouped representation using the expression:

data1.groupby(['deviceid', 'time', 'total_sent'])

How do I sum the total_sent per month?

deviceid                                  time            sum    
022009f075929be71975ce70db19cd47780b112f  1980-January    6210
ff5b22df4ab9207bb6709cddef6d95c655565578  2013-August     XXXX          

Solution

  • Since total_sent column is to be summed, it shouldn't be within the groupby keys. You can try the following:

    data1.groupby(['deviceid', 'time']).agg({'total_sent': sum})
    

    which will sum the total_sent column for each group, indexed by deviceid and time.