I have the following data structure:
and what I'd like to do is for each location, date to determine what was the spending exactly n days ago.
I am trying with a rolling window here but for some reason, when I create the window, I don't know how to get the first element in it:
dummy_data["spending_3_days_ago"] = (
dummy_data.sort_values("date")
.groupby("location")["spendings"]
.transform(lambda x: x.rolling(3, min_periods=1).??first??())
)
I can call all type of aggregate functions on the window such as: sum, mean etc. but I can not get the first.
Don't use a rolling
but rather a groupby.shift
:
dummy_data["spending_3_days_ago"] = (
dummy_data.sort_values("date")
.groupby("location")["spendings"]
.shift(3)
)
Output:
location date spendings spending_3_days_ago
0 1122 2017-12-01 451.12 NaN
1 1122 2017-12-02 542.43 NaN
2 1122 2017-12-03 512.23 NaN
3 1122 2017-12-04 821.23 451.12
4 2233 2017-12-01 892.31 NaN
5 2233 2017-12-02 303.32 NaN
6 2233 2017-12-03 673.32 NaN
7 2233 2017-12-04 1238.32 892.31
Or, if you want to simulate min_periods=1
, backpropagate the values:
dummy_data["spending_3_days_ago"] = (
dummy_data.sort_values("date")
.groupby("location")["spendings"]
.apply(lambda g: g.shift(3).bfill())
)
Output:
location date spendings spending_3_days_ago
0 1122 2017-12-01 451.12 451.12
1 1122 2017-12-02 542.43 451.12
2 1122 2017-12-03 512.23 451.12
3 1122 2017-12-04 821.23 451.12
4 2233 2017-12-01 892.31 892.31
5 2233 2017-12-02 303.32 892.31
6 2233 2017-12-03 673.32 892.31
7 2233 2017-12-04 1238.32 892.31