Search code examples
pythonpandasdataframegroup-by

how to convert monthly data to quarterly in pandas


I have monthly data. I want to convert it to "periods" of 3 months where q1 starts in January. So in the example below, the first 3 month aggregation would translate into start of q2 (desired format: 1996q2). And the data value that results from mushing together 3 monthly values is a mean (average) of 3 columns. Conceptually, not complicated. Does anyone know how to do it in one swoop? Potentially, I could do a lot of hard work through looping and just hardcode the hell out of it, but I am new to pandas and looking for something more clever than brute force.

1996-04   1996-05 1996-06  1996-07 .....
25          19       37      40

So I am looking for:

1996q2  1996q3   1996q4  1997q1  1997q2 .....
 avg      avg      avg     ...     ...

Solution

  • you can use pd.PeriodIndex(..., freq='Q') in conjunction with groupby(..., axis=1):

    In [63]: df
    Out[63]:
       1996-04  1996-05  2000-07  2000-08  2010-10  2010-11  2010-12
    0        1        2        3        4        1        1        1
    1       25       19       37       40        1        2        3
    2       10       20       30       40        4        4        5
    
    In [64]: df.groupby(pd.PeriodIndex(df.columns, freq='Q'), axis=1).mean()
    Out[64]:
       1996Q2  2000Q3    2010Q4
    0     1.5     3.5  1.000000
    1    22.0    38.5  2.000000
    2    15.0    35.0  4.333333
    

    UPDATE: to get columns in a resulting DF as strings intead of period dtype:

    In [66]: res = (df.groupby(pd.PeriodIndex(df.columns, freq='Q'), axis=1)
                      .mean()
                      .rename(columns=lambda c: str(c).lower()))
    
    In [67]: res
    Out[67]:
       1996q2  2000q3    2010q4
    0     1.5     3.5  1.000000
    1    22.0    38.5  2.000000
    2    15.0    35.0  4.333333
    
    In [68]: res.columns.dtype
    Out[68]: dtype('O')