Search code examples
dataframepython-polars

How to use cum_fold or cum_reduce to create stateful column


I'm trying to create a column, that changes its value for every 1/True in target column, and keeps previous value for 0/False. So for example how to get from this

a = pl.DataFrame({'a': [1, 0, 0, 0, 1, 0, 0, 1]})
shape: (8, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 0   │
│ 0   │
│ 0   │
│ 1   │
│ 0   │
│ 0   │
│ 1   │
└─────┘

to this dataframe

shape: (8, 2)
┌─────┬────────────┐
│ a   ┆ b          │
│ --- ┆ ---        │
│ i64 ┆ str        │
╞═════╪════════════╡
│ 1   ┆ new_value1 │
│ 0   ┆ new_value1 │
│ 0   ┆ new_value1 │
│ 0   ┆ new_value1 │
│ 1   ┆ new_value2 │
│ 0   ┆ new_value2 │
│ 0   ┆ new_value2 │
│ 1   ┆ new_value3 │
└─────┴────────────┘

Solution

  • In polars, fold, reduce, cum_fold and cum_reduce are horizontal expressions. Meaning that they operate op columns, not on elements.

    To achieve what you want, you can use cum_sum to get a monotonically increasing integer on every True value.

    Then we combine that result with the format expression to get the string output you want.

    a.with_columns(
        pl.format("new_value_{}", pl.col("a").cum_sum())
    )
    
    shape: (8, 2)
    ┌─────┬─────────────┐
    │ a   ┆ literal     │
    │ --- ┆ ---         │
    │ i64 ┆ str         │
    ╞═════╪═════════════╡
    │ 1   ┆ new_value_1 │
    │ 0   ┆ new_value_1 │
    │ 0   ┆ new_value_1 │
    │ 0   ┆ new_value_1 │
    │ 1   ┆ new_value_2 │
    │ 0   ┆ new_value_2 │
    │ 0   ┆ new_value_2 │
    │ 1   ┆ new_value_3 │
    └─────┴─────────────┘