Search code examples
pythonpandasgroup-bypandas-groupby

Aggregate daily data by month and an additional column


I've got a DataFrame storing daily-based data which is as below:

   Date        Product Number  Description        Revenue
2010-01-04       4219-057       Product A        39.299999    
2010-01-04       4219-056       Product A        39.520000
2010-01-04       4219-100       Product B        39.520000
2010-01-04       4219-056       Product A        39.520000
2010-01-05       4219-059       Product A        39.520000
2010-01-05       4219-056       Product A        39.520000
2010-01-05       4219-056       Product B        39.520000
2010-02-08       4219-123       Product A        39.520000
2010-02-08       4219-345       Product A        39.520000
2010-02-08       4219-456       Product B        39.520000
2010-02-08       4219-567       Product C        39.520000
2010-02-08       4219-789       Product D        39.520000

(Product number is just to give an idea) What I intend to do is to merge it into Monthly-based data. Something like:

Date        Description        Revenue
2010-01-01    Product A        157.85000 (Sum of all Product A in Month 01)    
              Product B        79.040000
              Product C        00.000000
              Product D        00.000000
2010-02-01    Product A        39.299999 (Sum of all Product A in Month 02)   
              Product B        39.520000
              Product C        39.520000
              Product D        39.520000  

The problem is I have 500+ products for every month

I am new to python and don't know how to implement it. Currently, I am using

import pandas as pd
import numpy as np
import matplotlib
%matplotlib inline

data.groupby(['DATE','REVENUE']).sum().unstack()

but not grouping it with the Products.

How can I implement this?


Solution

  • Convert "Date" to datetime, then use groupby and sum:

    # Do this first, if necessary.
    df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
    
    (df.groupby([pd.Grouper(key='Date', freq='MS'), 'Description'])['Revenue']
       .sum()
       .reset_index())
    
            Date Description     Revenue
    0 2010-01-01           A  197.379999
    1 2010-01-01           B   79.040000
    2 2010-02-01           A   79.040000
    3 2010-02-01           B   39.520000
    4 2010-02-01           C   39.520000
    5 2010-02-01           D   39.520000
    

    The fréquency "MS" specifies to group on dates and set the offset to the start of each month.