Search code examples
pandasdataframeaggregate-functions

How do I get the first element in a rolling window on Pandas data frame


I have the following data structure: enter image description here

and what I'd like to do is for each location, date to determine what was the spending exactly n days ago.

I am trying with a rolling window here but for some reason, when I create the window, I don't know how to get the first element in it:

dummy_data["spending_3_days_ago"] = (
        dummy_data.sort_values("date")
        .groupby("location")["spendings"]
        .transform(lambda x: x.rolling(3, min_periods=1).??first??())
    ) 

I can call all type of aggregate functions on the window such as: sum, mean etc. but I can not get the first.


Solution

  • Don't use a rolling but rather a groupby.shift:

    dummy_data["spending_3_days_ago"] = (
            dummy_data.sort_values("date")
            .groupby("location")["spendings"]
            .shift(3)
        )
    

    Output:

       location        date  spendings  spending_3_days_ago
    0      1122  2017-12-01     451.12                  NaN
    1      1122  2017-12-02     542.43                  NaN
    2      1122  2017-12-03     512.23                  NaN
    3      1122  2017-12-04     821.23               451.12
    4      2233  2017-12-01     892.31                  NaN
    5      2233  2017-12-02     303.32                  NaN
    6      2233  2017-12-03     673.32                  NaN
    7      2233  2017-12-04    1238.32               892.31
    

    Or, if you want to simulate min_periods=1, backpropagate the values:

    dummy_data["spending_3_days_ago"] = (
            dummy_data.sort_values("date")
            .groupby("location")["spendings"]
            .apply(lambda g: g.shift(3).bfill())
        )
    

    Output:

       location        date  spendings  spending_3_days_ago
    0      1122  2017-12-01     451.12               451.12
    1      1122  2017-12-02     542.43               451.12
    2      1122  2017-12-03     512.23               451.12
    3      1122  2017-12-04     821.23               451.12
    4      2233  2017-12-01     892.31               892.31
    5      2233  2017-12-02     303.32               892.31
    6      2233  2017-12-03     673.32               892.31
    7      2233  2017-12-04    1238.32               892.31