I have a Polars DataFrame that looks something like so:
df = pl.DataFrame(
{
'val': [9, 7, 9, 11, 2, 5],
'count': [1, 2, 1, 2, 1, 2],
'id': [1, 1, 2, 2, 3, 3],
}
)
What I need is to create a new column 'prev_val'
which will contain values for the same unique id taken from a row where the value in 'count'
column is smaller by one, i.e. looking something like:
┌─────┬───────┬─────┬──────────┐
│ val ┆ count ┆ id ┆ prev_val │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═══════╪═════╪══════════╡
│ 9 ┆ 1 ┆ 1 ┆ null │
│ 7 ┆ 2 ┆ 1 ┆ 9 │
│ 9 ┆ 1 ┆ 2 ┆ null │
│ 11 ┆ 2 ┆ 2 ┆ 9 │
│ 2 ┆ 1 ┆ 3 ┆ null │
│ 5 ┆ 2 ┆ 3 ┆ 2 │
└─────┴───────┴─────┴──────────┘
I couldn't figure a way of using native expressions so I tried doing this using map_rows
and a UDF, even though Polars guide discourages the use of UDFs.
Is there maybe a native way to do this?
It looks like something you would express as a .join()
df.join(df.with_columns(pl.col.count + 1), on=["id", "count"], how="left")
shape: (6, 4)
┌─────┬───────┬─────┬───────────┐
│ val ┆ count ┆ id ┆ val_right │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═══════╪═════╪═══════════╡
│ 9 ┆ 1 ┆ 1 ┆ null │
│ 7 ┆ 2 ┆ 1 ┆ 9 │
│ 9 ┆ 1 ┆ 2 ┆ null │
│ 11 ┆ 2 ┆ 2 ┆ 9 │
│ 2 ┆ 1 ┆ 3 ┆ null │
│ 5 ┆ 2 ┆ 3 ┆ 2 │
└─────┴───────┴─────┴───────────┘
If there can be multiple matches, you may want to use .unique()
to reduce the search space.
df.unique(subset=["id", "count"], keep="last", maintain_order=True)