How can I reduce the amount of data in a polars DataFrame?

I have a csv file with a size of 28 GB, which I want to plot. Those are way too many data points obviously, so how can I reduce the data? I would like to merge about 1000 data points into one by calculating the mean. This is the sturcture of my DataFrame:

df = pl.from_repr("""
┌─────────────────┬────────────┐
│ Time in seconds ┆ Force in N │
│ ---             ┆ ---        │
│ f64             ┆ f64        │
╞═════════════════╪════════════╡
│ 0.0             ┆ 2310.18    │
│ 0.0005          ┆ 2313.23    │
│ 0.001           ┆ 2314.14    │
└─────────────────┴────────────┘
""")

I thought about using group_by_dynamic, and then calculating the mean of each group, but this only seems to work when using datetimes? The time in seconds is given as a float however.

Solution

You can also group by an integer column to create groups of size N:

In case of a group_by_dynamic on an integer column, the windows are defined by:

“1i” # length 1

“10i” # length 10

We can add a row index and cast to pl.Int64 to use it.

(df.with_row_index()
   .group_by_dynamic(pl.col.index.cast(pl.Int64), every="2i")
   .agg("force")
)

shape: (4, 2)
┌───────┬────────────┐
│ index ┆ force      │
│ ---   ┆ ---        │
│ i64   ┆ list[str]  │
╞═══════╪════════════╡
│ 0     ┆ ["A", "B"] │
│ 2     ┆ ["C", "D"] │
│ 4     ┆ ["E", "F"] │
│ 6     ┆ ["G"]      │
└───────┴────────────┘