given the following example table
Index | Date | Weekday | Value |
---|---|---|---|
1 | 05/12/2022 | 2 | 10 |
2 | 06/12/2022 | 3 | 20 |
3 | 07/12/2022 | 4 | 40 |
4 | 09/12/2022 | 6 | 10 |
5 | 10/12/2022 | 7 | 60 |
6 | 11/12/2022 | 1 | 30 |
7 | 12/12/2022 | 2 | 40 |
8 | 13/12/2022 | 3 | 50 |
9 | 14/12/2022 | 4 | 60 |
10 | 16/12/2022 | 6 | 20 |
11 | 17/12/2022 | 7 | 50 |
12 | 18/12/2022 | 1 | 10 |
13 | 20/12/2022 | 3 | 20 |
14 | 21/12/2022 | 4 | 10 |
15 | 22/12/2022 | 5 | 40 |
I want to calculate a rolling average of the last three observations (at least) a week ago. I cannot use .shift as some dates are randomly missing, and .shift would therefore not produce a reliable output.
Desired output example for last three rows in the example dataset:
Index 13: Avg of indices 8, 7, 6 = (30+40+50) / 3 = 40
Index 14: Avg of indices 9, 8, 7 = (40+50+60) / 3 = 50
Index 15: Avg of indices 9, 8, 7 = (40+50+60) / 3 = 50
What would be a working solution for this? Thanks!
Thanks!
MOSTLY inspired from @Aidis you could, make his solution an apply:
df['mean']=df.apply(lambda y: df["Value"][df['Date'] <= y['Date'] - pd.Timedelta(1, "W")].tail(3).mean(), axis=1)
or spliting the data at each call which may run faster if you have lots of data (to be tested):
df['mean']=df.apply(lambda y: df.loc[:y.name, "Value"][ df.loc[:y.name,'Date'] <= y['Date'] - pd.Timedelta(1, "W")].tail(3).mean(), axis=1)
which returns:
Index Date Weekday Value mean
0 1 2022-12-05 2 10 NaN
1 2 2022-12-06 3 20 NaN
2 3 2022-12-07 4 40 NaN
3 4 2022-12-09 6 10 NaN
4 5 2022-12-10 7 60 NaN
5 6 2022-12-11 1 30 NaN
6 7 2022-12-12 2 40 10.000000
7 8 2022-12-13 3 50 15.000000
8 9 2022-12-14 4 60 23.333333
9 10 2022-12-16 6 20 23.333333
10 11 2022-12-17 7 50 36.666667
11 12 2022-12-18 1 10 33.333333
12 13 2022-12-20 3 20 40.000000
13 14 2022-12-21 4 10 50.000000
14 15 2022-12-22 5 40 50.000000