I'd like to use the bucket expression with groupby, to downsample on monthly basis, as the downsampling function will be deprecated. Is there a easy way to do this, datetime.timedelta only works on days and lower.
With the landing of group_by_dynamic
we can now downsample and use the whole expression API for our aggregations. Meaning we can resample
by either.
Let's go through an example:
import polars as pl
from datetime import datetime
df = pl.DataFrame(
{"time": pl.datetime_range(start=datetime(2021, 12, 16), end=datetime(2021, 12, 16, 3), interval="30m", eager=True),
"groups": ["a", "a", "a", "b", "b", "a", "a"],
"values": [1., 2., 3., 4., 5., 6., 7.]
})
print(df)
shape: (7, 3)
┌─────────────────────┬────────┬────────┐
│ time ┆ groups ┆ values │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ str ┆ f64 │
╞═════════════════════╪════════╪════════╡
│ 2021-12-16 00:00:00 ┆ a ┆ 1.0 │
│ 2021-12-16 00:30:00 ┆ a ┆ 2.0 │
│ 2021-12-16 01:00:00 ┆ a ┆ 3.0 │
│ 2021-12-16 01:30:00 ┆ b ┆ 4.0 │
│ 2021-12-16 02:00:00 ┆ b ┆ 5.0 │
│ 2021-12-16 02:30:00 ┆ a ┆ 6.0 │
│ 2021-12-16 03:00:00 ┆ a ┆ 7.0 │
└─────────────────────┴────────┴────────┘
Upsampling can be done by defining an interval. This will yield a DataFrame
with nulls, which can then be filled with a fill strategy or interpolation.
df.upsample("time", every="15m").fill_null(strategy="forward")
shape: (13, 3)
┌─────────────────────┬────────┬────────┐
│ time ┆ groups ┆ values │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ str ┆ f64 │
╞═════════════════════╪════════╪════════╡
│ 2021-12-16 00:00:00 ┆ a ┆ 1.0 │
│ 2021-12-16 00:15:00 ┆ a ┆ 1.0 │
│ 2021-12-16 00:30:00 ┆ a ┆ 2.0 │
│ 2021-12-16 00:45:00 ┆ a ┆ 2.0 │
│ 2021-12-16 01:00:00 ┆ a ┆ 3.0 │
│ … ┆ … ┆ … │
│ 2021-12-16 02:00:00 ┆ b ┆ 5.0 │
│ 2021-12-16 02:15:00 ┆ b ┆ 5.0 │
│ 2021-12-16 02:30:00 ┆ a ┆ 6.0 │
│ 2021-12-16 02:45:00 ┆ a ┆ 6.0 │
│ 2021-12-16 03:00:00 ┆ a ┆ 7.0 │
└─────────────────────┴────────┴────────┘
(df.upsample("time", every="15m")
.interpolate()
.fill_null(strategy="forward") # string columns cannot be interpolated
)
shape: (13, 3)
┌─────────────────────┬────────┬────────┐
│ time ┆ groups ┆ values │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ str ┆ f64 │
╞═════════════════════╪════════╪════════╡
│ 2021-12-16 00:00:00 ┆ a ┆ 1.0 │
│ 2021-12-16 00:15:00 ┆ a ┆ 1.5 │
│ 2021-12-16 00:30:00 ┆ a ┆ 2.0 │
│ 2021-12-16 00:45:00 ┆ a ┆ 2.5 │
│ 2021-12-16 01:00:00 ┆ a ┆ 3.0 │
│ … ┆ … ┆ … │
│ 2021-12-16 02:00:00 ┆ b ┆ 5.0 │
│ 2021-12-16 02:15:00 ┆ b ┆ 5.5 │
│ 2021-12-16 02:30:00 ┆ a ┆ 6.0 │
│ 2021-12-16 02:45:00 ┆ a ┆ 6.5 │
│ 2021-12-16 03:00:00 ┆ a ┆ 7.0 │
└─────────────────────┴────────┴────────┘
This is the powerful one, because we can also combine this with normal groupby keys. Having a virtual moving window over the time series (grouped by one or multiple keys) that can be aggregated with the expression API.
(df.group_by_dynamic(
index_column="time",
every="1h",
closed="both",
group_by="groups",
include_boundaries=True
)
.agg(
count = pl.col("time").count(),
max = pl.col("time").max(),
sum = pl.col("values").sum()
))
shape: (6, 7)
┌────────┬─────────────────────┬─────────────────────┬─────────────────────┬───────┬─────────────────────┬──────┐
│ groups ┆ _lower_boundary ┆ _upper_boundary ┆ time ┆ count ┆ max ┆ sum │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ datetime[μs] ┆ datetime[μs] ┆ datetime[μs] ┆ u32 ┆ datetime[μs] ┆ f64 │
╞════════╪═════════════════════╪═════════════════════╪═════════════════════╪═══════╪═════════════════════╪══════╡
│ a ┆ 2021-12-16 00:00:00 ┆ 2021-12-16 01:00:00 ┆ 2021-12-16 00:00:00 ┆ 3 ┆ 2021-12-16 01:00:00 ┆ 6.0 │
│ a ┆ 2021-12-16 01:00:00 ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 01:00:00 ┆ 1 ┆ 2021-12-16 01:00:00 ┆ 3.0 │
│ a ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 03:00:00 ┆ 2021-12-16 02:00:00 ┆ 2 ┆ 2021-12-16 03:00:00 ┆ 13.0 │
│ a ┆ 2021-12-16 03:00:00 ┆ 2021-12-16 04:00:00 ┆ 2021-12-16 03:00:00 ┆ 1 ┆ 2021-12-16 03:00:00 ┆ 7.0 │
│ b ┆ 2021-12-16 01:00:00 ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 01:00:00 ┆ 2 ┆ 2021-12-16 02:00:00 ┆ 9.0 │
│ b ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 03:00:00 ┆ 2021-12-16 02:00:00 ┆ 1 ┆ 2021-12-16 02:00:00 ┆ 5.0 │
└────────┴─────────────────────┴─────────────────────┴─────────────────────┴───────┴─────────────────────┴──────┘