Get the first, last occurrence in a time series df in polars

How do I get the first, last occurrence (>0) of a particular column in a time series df,

df = pl.from_repr("""
┌────────────┬────────────┬────────────┐
│ date       ┆ column_one ┆ column_two │
│ ---        ┆ ---        ┆ ---        │
│ date       ┆ f64        ┆ i64        │
╞════════════╪════════════╪════════════╡
│ 2024-06-01 ┆ 0.0        ┆ 0          │
│ 2024-06-02 ┆ 0.0        ┆ 1          │
│ 2024-06-03 ┆ 1.0        ┆ 2          │
│ 2024-06-04 ┆ 1.2        ┆ 3          │
└────────────┴────────────┴────────────┘
""")

Desired result:

shape: (2, 3)
┌────────────┬──────────────────┬─────────────────┐
│ columns    ┆ first_appearance ┆ last_appearance │
│ ---        ┆ ---              ┆ ---             │
│ str        ┆ date             ┆ date            │
╞════════════╪══════════════════╪═════════════════╡
│ column_one ┆ 2024-06-03       ┆ 2024-06-04      │
│ column_two ┆ 2024-06-02       ┆ 2024-06-04      │
└────────────┴──────────────────┴─────────────────┘

Solution

The desired result can be obtained by first using pl.DataFrame.unpivot to obtain a long frame with columns date, columns, and value. Then, grouping by columns gives first and last appearance > 0 within each time-series.

(
    df
    .unpivot(index="date", variable_name="columns")
    .group_by("columns")
    .agg(
        pl.col("date").filter(pl.col("value") > 0).min().alias("first_appearance"),
        pl.col("date").filter(pl.col("value") > 0).max().alias("last_appearance"),
    )
)

shape: (2, 3)
┌────────────┬──────────────────┬─────────────────┐
│ columns    ┆ first_appearance ┆ last_appearance │
│ ---        ┆ ---              ┆ ---             │
│ str        ┆ date             ┆ date            │
╞════════════╪══════════════════╪═════════════════╡
│ column_one ┆ 2024-06-03       ┆ 2024-06-04      │
│ column_two ┆ 2024-06-02       ┆ 2024-06-04      │
└────────────┴──────────────────┴─────────────────┘

Note. If the dataframe is sorted by date, then pl.Expr.first / pl.Expr.last can be used instead of pl.Expr.min / pl.Expr.max. Moreover, the two filter expressions within the aggrecation could be replaced by a single pl.DataFrame.filter before the aggregation.