Search code examples
pythondataframepython-polars

Replace a cell in a column based on a cell in another column in a polars DataFrame


Consider the following pl.DataFrame:

import polars as pl

df = pl.DataFrame(
    {
        "symbol": ["s1", "s1", "s2", "s2"],
        "signal": [0, 1, 2, 0],
        "trade": [None, 1, None, -1],
    }
)

shape: (4, 3)
┌────────┬────────┬───────┐
│ symbol ┆ signal ┆ trade │
│ ---    ┆ ---    ┆ ---   │
│ str    ┆ i64    ┆ i64   │
╞════════╪════════╪═══════╡
│ s1     ┆ 0      ┆ null  │
│ s1     ┆ 1      ┆ 1     │
│ s2     ┆ 2      ┆ null  │
│ s2     ┆ 0      ┆ -1    │
└────────┴────────┴───────┘

Now, I need to group the dataframe by symbol and check if the first row in every group in column signal is not equal to 0 (zero). It this equals to True, I need to replace the corresponding cell in column trade with the value in the cell in signal.

Here's what I am actually looking for:

shape: (4, 3)
┌────────┬────────┬───────┐
│ symbol ┆ signal ┆ trade │
│ ---    ┆ ---    ┆ ---   │
│ str    ┆ i64    ┆ i64   │
╞════════╪════════╪═══════╡
│ s1     ┆ 0      ┆ null  │
│ s1     ┆ 1      ┆ 1     │
│ s2     ┆ 2      ┆ 2     │ <- copy value from the ``signal`` column
│ s2     ┆ 0      ┆ -1    │
└────────┴────────┴───────┘

Solution

  • For this, a when-then-otherwise construct may be used.

    • We create a condition that evaluates to True exactly for the first rows (create index on the fly using pl.int_range) in each group with signal not equal to 0.
    • Based on that condition, we either select the value in the signal or trade column.
    df.with_columns(
        trade=pl.when(
            pl.col("signal") != 0,
            pl.int_range(pl.len()) == 0,
        ).then("signal").otherwise("trade").over("symbol")
    )
    
    shape: (4, 3)
    ┌────────┬────────┬───────┐
    │ symbol ┆ signal ┆ trade │
    │ ---    ┆ ---    ┆ ---   │
    │ str    ┆ i64    ┆ i64   │
    ╞════════╪════════╪═══════╡
    │ s1     ┆ 0      ┆ null  │
    │ s1     ┆ 1      ┆ 1     │
    │ s2     ┆ 2      ┆ 2     │
    │ s2     ┆ 0      ┆ -1    │
    └────────┴────────┴───────┘