Search code examples
python-polars

group_by_dynamic with selfdesigned index


how to make something like group_by_dynamic but can support a user-defined index

the group_by_dynamic can support timeindex to make a operation as a resample

but can only support the range of a non-duplicate way, such as

time
day1   9:00
day1 15:00
day2  9:00
day2  15:00
day3  9:00
day3 15:00

dynamic groupby to 1D


day1  9:00
day1 15:00
--------------
day2  9:00
day2  15:00
-------------
day3  9:00
day3 15:00

the feature i ask is a more user-defined way to dynamic-groupby, and the index may be duplicated

day1  9:00
day1 15:00

day2  9:00
day2  15:00
-------------
day2  9:00
day2  15:00
day3  9:00
day3 15:00
--------------

i can use rolling in a series, but the rolling_apply waste a lot of time cause it roll every index

day1  9:00
day1 15:00

day2  9:00
day2  15:00
-------------
day1 15:00
day2  9:00
day2  15:00
day3  9:00      
--------------  -------> this window is useless
day2  9:00
day2  15:00
day3  9:00
day3  15:00
-------------

day2  15:00
day3  9:00
day3  15:00
day4  9:00   
------------  -------> this window is useless

example pic


Solution

  • The solution is to give a different value between the every || period.

    • every decides the output of the index.

    • periods gives the window you need.

    Examples

    import datetime
    
    df = pl.DataFrame(
        {
          "time": pl.datetime_range(
                datetime.datetime(2021, 12, 16),
                datetime.datetime(2021, 12, 22),
                interval="12h",
                eager=True
             ),
             "n": [1] * 13
        }
    )
    
    df.group_by_dynamic('time', period='2d', every='1d',include_boundaries=True,closed='right').agg(pl.col('n').sum())