I'm trying to downsample a dataframe in polars which has a column with timestamps. I want to downsample the column to intervals of 1s. But when i do this using group_by_dynamic only the seconds which have values are coming in the result. But I want the result to contain even seconds which has no entries so I can forward fill.
I tried group_by_dynamic, and it doesnt seem to do what I want.
taq_df.group_by_dynamic("timestamp", every = '1s').agg(pl.exclude("bestex_timestamp").last())
TLDR. Applying pl.DataFrame.upsample
after the aggregation will fill the missing timestamps.
Consider the following example frame.
import polars as pl
from datetime import datetime
df = pl.DataFrame({
"timestamp": [
datetime(2024, 1, 1, 0, 0, 0, 0),
datetime(2024, 1, 1, 0, 0, 2, 5),
datetime(2024, 1, 1, 0, 0, 3, 0),
datetime(2024, 1, 1, 0, 0, 5, 1),
],
"value": [1, 2, 3, 4],
}).set_sorted("timestamp")
shape: (4, 2)
┌────────────────────────────┬───────┐
│ timestamp ┆ value │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞════════════════════════════╪═══════╡
│ 2024-01-01 00:00:00 ┆ 1 │
│ 2024-01-01 00:00:02.000005 ┆ 2 │
│ 2024-01-01 00:00:03 ┆ 3 │
│ 2024-01-01 00:00:05.000001 ┆ 4 │
└────────────────────────────┴───────┘
Simply applying pl.DataFrame.group_by_dynamic
as suggested in the question, results in the following dataframe (with missing timestamps for seconds 1 and 4).
(
df
.group_by_dynamic(index_column="timestamp", every="1s")
.agg(pl.col("value").last())
)
shape: (4, 2)
┌─────────────────────┬───────┐
│ timestamp ┆ value │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪═══════╡
│ 2024-01-01 00:00:00 ┆ 1 │
│ 2024-01-01 00:00:02 ┆ 2 │
│ 2024-01-01 00:00:03 ┆ 3 │
│ 2024-01-01 00:00:05 ┆ 4 │
└─────────────────────┴───────┘
Now, applying pl.DataFrame.upsample
fills the missing timestamps. All other column will have null values for the new rows.
(
df
.group_by_dynamic(index_column="timestamp", every="1s")
.agg(pl.col("value").last())
.upsample("timestamp", every="1s")
)
shape: (6, 2)
┌─────────────────────┬───────┐
│ timestamp ┆ value │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪═══════╡
│ 2024-01-01 00:00:00 ┆ 1 │
│ 2024-01-01 00:00:01 ┆ null │
│ 2024-01-01 00:00:02 ┆ 2 │
│ 2024-01-01 00:00:03 ┆ 3 │
│ 2024-01-01 00:00:04 ┆ null │
│ 2024-01-01 00:00:05 ┆ 4 │
└─────────────────────┴───────┘