I have a polars dataframe as follows:
df = pl.from_repr("""
┌─────┬───────┐
│ day ┆ value │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═══════╡
│ 1 ┆ 1 │
│ 1 ┆ 2 │
│ 1 ┆ 2 │
│ 3 ┆ 3 │
│ 3 ┆ 5 │
│ 3 ┆ 2 │
│ 5 ┆ 1 │
│ 5 ┆ 2 │
│ 8 ┆ 7 │
│ 8 ┆ 3 │
│ 9 ┆ 5 │
│ 9 ┆ 3 │
│ 9 ┆ 4 │
└─────┴───────┘
""")
I want to incrementally rotate the values in column 'day'? By incremental rotation, I mean for each value, change it to its next larger value exists in the column, and if the value is the largest, then change it to null/None.
Basically, the result I expect should be the following:
┌──────┬───────┐
│ day ┆ value │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞══════╪═══════╡
│ 3 ┆ 1 │
│ 3 ┆ 2 │
│ 3 ┆ 2 │
│ 5 ┆ 3 │
│ 5 ┆ 5 │
│ 5 ┆ 2 │
│ 8 ┆ 1 │
│ 8 ┆ 2 │
│ 9 ┆ 7 │
│ 9 ┆ 3 │
│ null ┆ 5 │
│ null ┆ 3 │
│ null ┆ 4 │
└──────┴───────┘
Is there some particular polars-python idiomatic way to achieve this?
If day is sorted - you could group together - shift - then explode back?
(df.groupby("day", maintain_order=True)
.agg_list()
.with_columns(pl.col("day").shift(-1))
.explode(pl.exclude("day")))
shape: (13, 2)
┌──────┬───────┐
│ day | value │
│ --- | --- │
│ i64 | i64 │
╞══════╪═══════╡
│ 3 | 1 │
│ 3 | 2 │
│ 3 | 2 │
│ 5 | 3 │
│ 5 | 5 │
│ 5 | 2 │
│ 8 | 1 │
│ 8 | 2 │
│ 9 | 7 │
│ 9 | 3 │
│ null | 5 │
│ null | 3 │
│ null | 4 │
└──────┴───────┘
Perhaps another approach is to .rank()
the column.
.search_sorted()
for rank + 1
could find the positions of the next "group".
The max values could be nulled out then passed to .take()
to get the new values.
(df.with_columns(
pl.col("day").rank("dense")
.cast(pl.Int64)
.alias("rank"))
.with_columns(
pl.col("rank")
.search_sorted(pl.col("rank") + 1)
.alias("idx"))
.with_columns(
pl.when(pl.col("idx") != pl.col("idx").max())
.then(pl.col("idx"))
.alias("idx"))
.with_columns(
pl.col("day").take(pl.col("idx"))
.alias("new"))
)
shape: (13, 5)
┌─────┬───────┬──────┬──────┬──────┐
│ day | value | rank | idx | new │
│ --- | --- | --- | --- | --- │
│ i64 | i64 | i64 | u32 | i64 │
╞═════╪═══════╪══════╪══════╪══════╡
│ 1 | 1 | 1 | 3 | 3 │
│ 1 | 2 | 1 | 3 | 3 │
│ 1 | 2 | 1 | 3 | 3 │
│ 3 | 3 | 2 | 6 | 5 │
│ 3 | 5 | 2 | 6 | 5 │
│ 3 | 2 | 2 | 6 | 5 │
│ 5 | 1 | 3 | 8 | 8 │
│ 5 | 2 | 3 | 8 | 8 │
│ 8 | 7 | 4 | 10 | 9 │
│ 8 | 3 | 4 | 10 | 9 │
│ 9 | 5 | 5 | null | null │
│ 9 | 3 | 5 | null | null │
│ 9 | 4 | 5 | null | null │
└─────┴───────┴──────┴──────┴──────┘
Feels like I'm missing an obvious simpler approach here..