Consider the following pl.DataFrame
:
import polars as pl
df = pl.DataFrame(
{
"symbol": ["s1", "s1", "s2", "s2"],
"signal": [0, 1, 2, 0],
"trade": [None, 1, None, -1],
}
)
shape: (4, 3)
┌────────┬────────┬───────┐
│ symbol ┆ signal ┆ trade │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞════════╪════════╪═══════╡
│ s1 ┆ 0 ┆ null │
│ s1 ┆ 1 ┆ 1 │
│ s2 ┆ 2 ┆ null │
│ s2 ┆ 0 ┆ -1 │
└────────┴────────┴───────┘
Now, I need to group the dataframe by symbol
and check if the first row in every group in column signal
is not equal to 0 (zero). It this equals to True
, I need to replace the corresponding cell in column trade
with the value in the cell in signal
.
Here's what I am actually looking for:
shape: (4, 3)
┌────────┬────────┬───────┐
│ symbol ┆ signal ┆ trade │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞════════╪════════╪═══════╡
│ s1 ┆ 0 ┆ null │
│ s1 ┆ 1 ┆ 1 │
│ s2 ┆ 2 ┆ 2 │ <- copy value from the ``signal`` column
│ s2 ┆ 0 ┆ -1 │
└────────┴────────┴───────┘
For this, a when-then-otherwise construct may be used.
True
exactly for the first rows (create index on the fly using pl.int_range
) in each group with signal
not equal to 0.signal
or trade
column.df.with_columns(
trade=pl.when(
pl.col("signal") != 0,
pl.int_range(pl.len()) == 0,
).then("signal").otherwise("trade").over("symbol")
)
shape: (4, 3)
┌────────┬────────┬───────┐
│ symbol ┆ signal ┆ trade │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞════════╪════════╪═══════╡
│ s1 ┆ 0 ┆ null │
│ s1 ┆ 1 ┆ 1 │
│ s2 ┆ 2 ┆ 2 │
│ s2 ┆ 0 ┆ -1 │
└────────┴────────┴───────┘