Search code examples
pythonpandasresampling

Pandas: Aggregate by month for every subgroup


I have the following pandas table

                          TUFNWGTP  TELFS  t070101  t070102  t070103  t070104  \
TUDIARYDATE status                                                              
2003-01-03  emp     8155462.672158      2        0        0        0        0   
2003-01-04  emp     1735322.527819      1        0        0        0        0   
            emp     3830527.482672      2       60        0        0        0   
2003-01-02  unemp   6622022.995205      4        0        0        0        0   
2003-01-09  emp     3068387.344956      1        0        0        0        0

and I want to aggregate the daily data to monthly data, for every subgroup.

That is, if there was no status subindex, I would do

df.resample('M', how='sum')

How can I do the monthly aggregation for every subgroup?


Solution

  • I think you need to have a DatetimeIndex (rather than a MultiIndex):

    In [11]: df = df.reset_index('status')
    
    In [12]: df
    Out[12]:
                status        TUFNWGTP  TELFS  t070101  t070102  t070103  t070104
    TUDIARYDATE
    2003-01-03     emp  8155462.672158      2        0        0        0        0
    2003-01-04     emp  1735322.527819      1        0        0        0        0
    2003-01-04     emp  3830527.482672      2       60        0        0        0
    2003-01-02   unemp  6622022.995205      4        0        0        0        0
    2003-01-09     emp  3068387.344956      1        0        0        0        0
    

    then do a groupby with a monthly Grouper and the status column:

    In [13]: df.groupby([pd.Grouper(freq='M'), 'status']).sum()
    Out[13]:
                               TUFNWGTP  TELFS  t070101  t070102  t070103  t070104
    TUDIARYDATE status
    2003-01-31  emp     16789700.027605      6       60        0        0        0
                unemp    6622022.995205      4        0        0        0        0