how to make something like group_by_dynamic
but can support a user-defined index
the group_by_dynamic can support timeindex to make a operation as a resample
but can only support the range of a non-duplicate way, such as
time
day1 9:00
day1 15:00
day2 9:00
day2 15:00
day3 9:00
day3 15:00
dynamic groupby to 1D
day1 9:00
day1 15:00
--------------
day2 9:00
day2 15:00
-------------
day3 9:00
day3 15:00
the feature i ask is a more user-defined way to dynamic-groupby, and the index may be duplicated
day1 9:00
day1 15:00
day2 9:00
day2 15:00
-------------
day2 9:00
day2 15:00
day3 9:00
day3 15:00
--------------
i can use rolling in a series, but the rolling_apply waste a lot of time cause it roll every index
day1 9:00
day1 15:00
day2 9:00
day2 15:00
-------------
day1 15:00
day2 9:00
day2 15:00
day3 9:00
-------------- -------> this window is useless
day2 9:00
day2 15:00
day3 9:00
day3 15:00
-------------
day2 15:00
day3 9:00
day3 15:00
day4 9:00
------------ -------> this window is useless
The solution is to give a different value between the every || period.
every
decides the output of the index.
periods
gives the window you need.
Examples
import datetime
df = pl.DataFrame(
{
"time": pl.datetime_range(
datetime.datetime(2021, 12, 16),
datetime.datetime(2021, 12, 22),
interval="12h",
eager=True
),
"n": [1] * 13
}
)
df.group_by_dynamic('time', period='2d', every='1d',include_boundaries=True,closed='right').agg(pl.col('n').sum())