When performing a group_by_dynamic on a index columns and aggregating a columns in a list (as to see which values are in the group), for a given group and a given index i, the values in the list correspond to the value in the correct group, but for index i, i+1, ... until the period. This look ahead in the index seems to contrast with the rolling_mean function which (by default) the values of lower indices, not higher, to perform the rolling mean.
Is it intentionally designed this way, and if so, how can one perform a group_by_dynamic using lower indices? (i am not sure what the offset parameter does, but not what i want here)
Here is an example that i expected not to raise
import polars as pl
df = pl.DataFrame(
{
"index": [0, 0, 1, 1],
"group": ["banana", "pear", "banana", "pear"],
"weight": [2, 3, 5, 7],
}
)
agg = df.group_by_dynamic("index", group_by="group", every="1i", period="2i").agg(pl.col("weight"))
assert((
agg
.filter(index=0, group="banana")
.select("weight")
.to_series()
.to_list()
) == [[2]])
Thank you
As mentioned, a solution can be rolling
.
With your example:
df.group_by_dynamic("index", group_by="group", every="1i", period="2i").agg(pl.col("weight"))
shape: (4, 3)
┌────────┬───────┬───────────┐
│ group ┆ index ┆ weight │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ list[i64] │
╞════════╪═══════╪═══════════╡
│ banana ┆ 0 ┆ [2, 5] │
│ banana ┆ 1 ┆ [5] │
│ pear ┆ 0 ┆ [3, 7] │
│ pear ┆ 1 ┆ [7] │
└────────┴───────┴───────────┘
With rolling:
df.rolling(index_column="index", period="2i", group_by="group").agg(pl.col("weight"))
shape: (4, 3)
┌────────┬───────┬───────────┐
│ group ┆ index ┆ weight │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ list[i64] │
╞════════╪═══════╪═══════════╡
│ banana ┆ 0 ┆ [2] │
│ banana ┆ 1 ┆ [2, 5] │
│ pear ┆ 0 ┆ [3] │
│ pear ┆ 1 ┆ [3, 7] │
└────────┴───────┴───────────┘