Search code examples
pandasdataframegroup-byrolling-computationpandas-resample

Calculate the rolling average every two weeks for the same day and hour in a DataFrame


I have a Dataframe like the following:

df = pd.DataFrame()
df['datetime'] = pd.date_range(start='2023-1-2', end='2023-1-29', freq='15min')
df['week'] = df['datetime'].apply(lambda x: int(x.isocalendar()[1]))
df['day_of_week'] = df['datetime'].dt.weekday
df['hour'] = df['datetime'].dt.hour
df['minutes'] = pd.DatetimeIndex(df['datetime']).minute
df['value'] = range(len(df))
df.set_index('datetime',inplace=True)


  df =                  week day_of_week hour minutes value
    datetime                    
    2023-01-02 00:00:00 1   0   0   0   0
    2023-01-02 00:15:00 1   0   0   15  1
    2023-01-02 00:30:00 1   0   0   30  2
    2023-01-02 00:45:00 1   0   0   45  3
    2023-01-02 01:00:00 1   0   1   0   4
    ... ... ... ... ... ...
    2023-01-08 23:00:00 1   6   23  0   668
    2023-01-08 23:15:00 1   6   23  15  669
    2023-01-08 23:30:00 1   6   23  30  670
    2023-01-08 23:45:00 1   6   23  45  671
    2023-01-09 00:00:00 2   0   0   0   672

And I want to calculate the average of the column "value" for the same hour/minute/day, every two consecutive weeks.

What I would like to get is the following:

df=
                                                    value
    day_of_week hour minutes    datetime    
              0    0       0    2023-01-02 00:00:00 NaN
                                2023-01-09 00:00:00 NaN
                                2023-01-16 00:00:00 336
                                2023-01-23 00:00:00 1008
                           15   2023-01-02 00:15:00 NaN
                                2023-01-09 00:15:00 NaN
                                2023-01-16 00:15:00 337
                                2023-01-23 00:15:00 1009

So the first two weeks should have NaN values and week-3 should be the average of week-1 and week-2 and then week-4 the average of week-2 and week-3 and so on.

I tried the following code but it does not seem to do what I expect:

df = pd.DataFrame(df.groupby(['day_of_week','hour','minutes'])['value'].rolling(window='14D', min_periods=1).mean())

As what I am getting is:

                                                value
day_of_week hour minutes.  datetime 
0           0    0         2023-01-02 00:00:00  0
                           2023-01-09 00:00:00  336
                           2023-01-16 00:00:00  1008
                           2023-01-23 00:00:00  1680
                 15        2023-01-02 00:15:00  1
                           2023-01-09 00:15:00  337
                           2023-01-16 00:15:00  1009
                           2023-01-23 00:15:00  1681

Solution

  • I think you want to shift within each group. Then you need another groupby:

    (df.groupby(['day_of_week','hour','minutes'])['value']
       .rolling(window='14D', min_periods=2).mean()         # `min_periods` is different
       .groupby(['day_of_week','hour','minutes']).shift()   # shift within each group
       .to_frame()
    )
    

    Output:

                                                   value
    day_of_week hour minutes datetime                   
    0           0    0       2023-01-02 00:00:00     NaN
                             2023-01-09 00:00:00     NaN
                             2023-01-16 00:00:00   336.0
                             2023-01-23 00:00:00  1008.0
                     15      2023-01-02 00:15:00     NaN
    ...                                              ...
    6           23   30      2023-01-15 23:30:00     NaN
                             2023-01-22 23:30:00  1006.0
                     45      2023-01-08 23:45:00     NaN
                             2023-01-15 23:45:00     NaN
                             2023-01-22 23:45:00  1007.0