My question is whether there is a way to have null until the full window can be filled at the start of a rolling window in polars. For example:
dates = [
"2020-01-01",
"2020-01-02",
"2020-01-03",
"2020-01-04",
"2020-01-05",
"2020-01-06",
"2020-01-01",
"2020-01-02",
"2020-01-03",
"2020-01-04",
"2020-01-05",
"2020-01-06",
]
df = pl.DataFrame({"dt": dates, "a": [3, 4, 2, 8, 10, 1, 1, 7, 5, 9, 2, 1], "b": ["Yes","Yes","Yes","Yes","Yes", "Yes", "No", "No", "No", "No", "No", "No"]}).with_columns(
pl.col("dt").str.strptime(pl.Date).set_sorted()
)
df = df.sort(by = 'dt')
df.rolling(
index_column="dt", period="2d", group_by = 'b'
).agg(pl.col("a").mean().alias("ma_2d"))
Result
b dt ma_2d
str date f64
"Yes" 2020-01-01 3.0
"Yes" 2020-01-02 3.5
"Yes" 2020-01-03 3.0
"Yes" 2020-01-04 5.0
"Yes" 2020-01-05 9.0
My expectation in this case is that the first day should be null because there aren't 2 days to fill the window. But polars seems to just truncate the window to fill the starting days.
There is a feature request (#12798) to implement a min_periods
/min_samples
parameter that would have this effect, and this is also discussed in the issue #12049.