Search code examples
pythonpython-polars

polars rolling by option not allowed?


I have a data frame of the type:

df = pl.LazyFrame({"day": [1,2,4,5,2,3,5,6], 'type': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], "value": [1, 0, 3, 4, 2, 2, 0, 1]})


day type    value
i64 str i64
1   "a" 1
2   "a" 0
4   "a" 3
5   "a" 4
2   "b" 2
3   "b" 2
5   "b" 0
6   "b" 1

I am trying to create a rolling sum variable, summing, for each different "type", the values in a two days window. Ideally, the resulting dataset would be the following:

day type value rolling_sum
1 a 1 1
2 a 0 1
4 a 3 3
5 a 4 7
2 b 2 2
3 b 2 4
5 b 0 0
6 b 1 1

I tried using the following code:

df = df.with_columns(pl.col("value")
                     .rolling(index_column="day", by="type", period="2i")
                     .sum().alias("rolling_sum"))

but I get the error: "TypeError: rolling() got an unexpected keyword argument 'by'".

Could you help me fix it?


Solution

  • That's because in your code you're trying to use Expr.rolling() which doesn't have by parameter (strangely, it is mentioned in the documentation under check_sorted parameter - is it just not implemented yet?), instead of DataFrame.rolling().

    If you'd restructure the code to use the latter then it works fine:

    (
        df.rolling(
            index_column="day", by="type", period="2i"
        )
        .agg(
            pl.col('value').sum().alias("rolling_sum")
        )
    )
    
    ┌──────┬─────┬─────────────┐
    │ type ┆ day ┆ rolling_sum │
    │ ---  ┆ --- ┆ ---         │
    │ str  ┆ i64 ┆ i64         │
    ╞══════╪═════╪═════════════╡
    │ a    ┆ 1   ┆ 1           │
    │ a    ┆ 2   ┆ 1           │
    │ a    ┆ 4   ┆ 3           │
    │ a    ┆ 5   ┆ 7           │
    │ b    ┆ 2   ┆ 2           │
    │ b    ┆ 3   ┆ 4           │
    │ b    ┆ 5   ┆ 0           │
    │ b    ┆ 6   ┆ 1           │
    └──────┴─────┴─────────────┘
    

    If you need to have value column in your result, you can use Expr.rolling_sum() combined with Expr.over() instead (assuming your DataFrame is sorted by day already):

    df.with_columns(
        pl.col("value")
        .rolling_sum(window_size=2,min_periods=0)
        .over("type")
        .alias('rolling_sum')
    )
    
    ┌─────┬──────┬───────┬─────────────┐
    │ day ┆ type ┆ value ┆ rolling_sum │
    │ --- ┆ ---  ┆ ---   ┆ ---         │
    │ i64 ┆ str  ┆ i64   ┆ i64         │
    ╞═════╪══════╪═══════╪═════════════╡
    │ 1   ┆ a    ┆ 1     ┆ 1           │
    │ 2   ┆ a    ┆ 0     ┆ 1           │
    │ 4   ┆ a    ┆ 3     ┆ 3           │
    │ 5   ┆ a    ┆ 4     ┆ 7           │
    │ 2   ┆ b    ┆ 2     ┆ 2           │
    │ 3   ┆ b    ┆ 2     ┆ 4           │
    │ 5   ┆ b    ┆ 0     ┆ 2           │
    │ 6   ┆ b    ┆ 1     ┆ 1           │
    └─────┴──────┴───────┴─────────────┘
    

    Ideally, I would probably expect Expr.rolling together with Expr.over to work:

    # something like this
    df.with_columns(
        pl.col("value")
        .rolling(index_column="day", period="2i")
        .sum()
        .over("type")
        .alias('rolling_sum')
    )
    
    # or this
    df.set_sorted(['type','day']).with_columns(
        pl.col("value")
        .sum()
        .over('type')
        .rolling(index_column="day", period="2i")
        .alias('rolling_sum')
    )
    

    but unfortunately, it doesn't:

    InvalidOperationError: rolling expression not allowed in aggregation
    

    Update

    Using rolling_sum() might not be something you want, if you plan your window to be based on days / weeks etc. In this case you can still use DataFrame.rolling() and combine it with Expr.last() inside of GroupBy.agg() to get the last value in the window:

    (
        df.rolling(
            index_column="day", by="type", period="2i"
        )
        .agg(
            pl.col('value').last(),
            pl.col('value').sum().alias("rolling_sum")
        )
    )
    
    ┌──────┬─────┬───────┬─────────────┐
    │ type ┆ day ┆ value ┆ rolling_sum │
    │ ---  ┆ --- ┆ ---   ┆ ---         │
    │ str  ┆ i64 ┆ i64   ┆ i64         │
    ╞══════╪═════╪═══════╪═════════════╡
    │ a    ┆ 1   ┆ 1     ┆ 1           │
    │ a    ┆ 2   ┆ 0     ┆ 1           │
    │ a    ┆ 4   ┆ 3     ┆ 3           │
    │ a    ┆ 5   ┆ 4     ┆ 7           │
    │ b    ┆ 2   ┆ 2     ┆ 2           │
    │ b    ┆ 3   ┆ 2     ┆ 4           │
    │ b    ┆ 5   ┆ 0     ┆ 0           │
    │ b    ┆ 6   ┆ 1     ┆ 1           │
    └──────┴─────┴───────┴─────────────┘