Search code examples
pythonpandasdataframetime-seriescumsum

Getting sum data for smoothly shifting groups of 3 months of a months data in Pandas


I have a time series data of the following form:

  Item  2020 Jan  2020 Feb  2020 Mar  2020 Apr  2020 May  2020 Jun
0    A         0         1         2         3         4         5
1    B         5         4         3         2         1         0

This is monthly data but I want to get quarterly data of this data. A normal quarterly data would be calculated by summing up Jan-Mar and Apr-Jun and would look like this:

  Item  2020 Q1  2020 Q2
0    A        3       12
1    B       12        3

I want to get smoother quarterly data so it would shift by only 1 month for each new data item, not 3 months. So it would have Jan-Mar, then Feb-Apr, then Mar-May, and Apr-Jun. So the resulting data would look like this:

  Item  2020 Q1  2020 Q1  2020 Q1  2020 Q2
0    A        3        6        9       12
1    B       12        9        6        3

I believe this is similar to cumsum which can be used as follows:

df_dates = df.iloc[:,1:]
df_dates.cumsum(axis=1)

which leads to the following result:

   2020 Jan  2020 Feb  2020 Mar  2020 Apr  2020 May  2020 Jun
0         0         1         3         6        10        15
1         5         9        12        14        15        15

but instead of getting the sum over the whole time, it gets the sum of the nearest 3 months (a quarter). I do not know how this version of cumsum is called but I saw it in many places so I believe there might be a library function for that.


Solution

  • Let us solve in steps

    • Set the index to Item column
    • Parse the date like columns to quarterly period
    • Calculate the rolling sum with window of size 3
    • Shift the calculated rolling sum 2 units along the columns axis and get rid of the last two columns
    s = df.set_index('Item')
    s.columns = pd.PeriodIndex(s.columns, freq='M').strftime('%Y Q%q')
    
    s = s.rolling(3, axis=1).sum().shift(-2, axis=1).iloc[:, :-2]
    

    print(s)
    
          2020 Q1  2020 Q1  2020 Q1  2020 Q2
    Item                                    
    A         3.0      6.0      9.0     12.0
    B        12.0      9.0      6.0      3.0