Search code examples

Upsampling a polars dataframe with group_by

I'm trying to upsample a Polars dataframe while grouping by a particular column. In the following example, I wish to group by 'fruit' and then upsample by date.

df = pl.from_repr("""
│ fruit ┆ date                ┆ count │
│ ---   ┆ ---                 ┆ ---   │
│ str   ┆ datetime[ns]        ┆ i64   │
│ apple ┆ 2022-06-01 00:00:00 ┆ 5     │
│ apple ┆ 2022-06-03 00:00:00 ┆ 6     │
│ apple ┆ 2022-06-04 00:00:00 ┆ 2     │
│ apple ┆ 2022-06-07 00:00:00 ┆ 1     │
│ pear  ┆ 2022-06-01 00:00:00 ┆ 9     │
│ pear  ┆ 2022-06-07 00:00:00 ┆ 12    │

This is what the output should look like:

shape: (14, 3)
│ fruit ┆ date                ┆ count │
│ ---   ┆ ---                 ┆ ---   │
│ str   ┆ datetime[ns]        ┆ i64   │
│ apple ┆ 2022-06-01 00:00:00 ┆ 5     │
│ apple ┆ 2022-06-02 00:00:00 ┆ 5     │
│ apple ┆ 2022-06-03 00:00:00 ┆ 6     │
│ apple ┆ 2022-06-04 00:00:00 ┆ 2     │
│ apple ┆ 2022-06-05 00:00:00 ┆ 2     │
│ apple ┆ 2022-06-06 00:00:00 ┆ 2     │
│ apple ┆ 2022-06-07 00:00:00 ┆ 1     │
│ pear  ┆ 2022-06-01 00:00:00 ┆ 9     │
│ pear  ┆ 2022-06-02 00:00:00 ┆ 9     │
│ pear  ┆ 2022-06-03 00:00:00 ┆ 9     │
│ pear  ┆ 2022-06-04 00:00:00 ┆ 9     │
│ pear  ┆ 2022-06-05 00:00:00 ┆ 9     │
│ pear  ┆ 2022-06-06 00:00:00 ┆ 9     │
│ pear  ┆ 2022-06-07 00:00:00 ┆ 12    │

For a non group-by scenario, the following command gets me the result I need:

df.upsample('date', every='1d').fill_null(strategy="forward")

However, I've not been able to get it working when a groupby is involved

ps: here is a similar question, but using pandas - Pandas: resample timeseries with groupby


  • I realized that the upsample function has a 'group_by' parameter that gives me the results that I need. Here is a link to API docs for the .upsample() method.