Search code examples
pythonpython-polars

Polars Rolling Mean, fill start of window with null instead of shortened window


My question is whether there is a way to have null until the full window can be filled at the start of a rolling window in polars. For example:

dates = [
    "2020-01-01",
    "2020-01-02",
    "2020-01-03",
    "2020-01-04",
    "2020-01-05",
    "2020-01-06",
    "2020-01-01",
    "2020-01-02",
    "2020-01-03",
    "2020-01-04",
    "2020-01-05",
    "2020-01-06",
]
df = pl.DataFrame({"dt": dates, "a": [3, 4, 2, 8, 10, 1, 1, 7, 5, 9, 2, 1], "b": ["Yes","Yes","Yes","Yes","Yes", "Yes", "No", "No", "No", "No", "No", "No"]}).with_columns(
    pl.col("dt").str.strptime(pl.Date).set_sorted()
)
df = df.sort(by = 'dt')

df.rolling(
    index_column="dt", period="2d", group_by = 'b'
).agg(pl.col("a").mean().alias("ma_2d"))

Result

b   dt  ma_2d
str date    f64
"Yes"   2020-01-01  3.0
"Yes"   2020-01-02  3.5
"Yes"   2020-01-03  3.0
"Yes"   2020-01-04  5.0
"Yes"   2020-01-05  9.0

My expectation in this case is that the first day should be null because there aren't 2 days to fill the window. But polars seems to just truncate the window to fill the starting days.


Solution

  • There is a feature request (#12798) to implement a min_periods/min_samples parameter that would have this effect, and this is also discussed in the issue #12049.