import polars as pl
data = {'type': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'value': [5, 9, 1, 0, 3, 2, 5, 8, 9, 1, 0, 3, 3, 1, 1, 0, 2, 0, 0, 5, 7, 4, 7, 8, 9, 11, 1, 1, 0, 1, 4, 3, 21]}
df = pl.DataFrame(data)
print(df)
Given two columns of data, how can we group them by the 'type' column, sum the 'value' column using a rolling window of size 5, and then place the resulting data into a column named 'result'?
The results are as follows:
[None, None, None, None, 18, 15, 11, 18, 27, 25, 23, 21, 16, None, None, None, None, 4, 3, 7, 14, 16, None, None, None, None, 36, 30, 22, 14, 7, 9, 29]
(Please using the polars library only, Polars version = 0.17.9)
.rolling_sum
and .over
df.with_columns(result =
pl.col("value").rolling_sum(window_size=5).over("type")
)
shape: (33, 3)
┌──────┬───────┬────────┐
│ type ┆ value ┆ result │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞══════╪═══════╪════════╡
│ A ┆ 5 ┆ null │
│ A ┆ 9 ┆ null │
│ A ┆ 1 ┆ null │
│ A ┆ 0 ┆ null │
│ … ┆ … ┆ … │
│ C ┆ 1 ┆ 14 │
│ C ┆ 4 ┆ 7 │
│ C ┆ 3 ┆ 9 │
│ C ┆ 21 ┆ 29 │
└──────┴───────┴────────┘