How to use cum_fold or cum_reduce to create stateful column

I'm trying to create a column, that changes its value for every 1/True in target column, and keeps previous value for 0/False. So for example how to get from this

a = pl.DataFrame({'a': [1, 0, 0, 0, 1, 0, 0, 1]})

shape: (8, 1)
┌─────┐
│ a   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 0   │
│ 0   │
│ 0   │
│ 1   │
│ 0   │
│ 0   │
│ 1   │
└─────┘

to this dataframe

shape: (8, 2)
┌─────┬────────────┐
│ a   ┆ b          │
│ --- ┆ ---        │
│ i64 ┆ str        │
╞═════╪════════════╡
│ 1   ┆ new_value1 │
│ 0   ┆ new_value1 │
│ 0   ┆ new_value1 │
│ 0   ┆ new_value1 │
│ 1   ┆ new_value2 │
│ 0   ┆ new_value2 │
│ 0   ┆ new_value2 │
│ 1   ┆ new_value3 │
└─────┴────────────┘

Solution

In polars, fold, reduce, cum_fold and cum_reduce are horizontal expressions. Meaning that they operate op columns, not on elements.

To achieve what you want, you can use cum_sum to get a monotonically increasing integer on every True value.

Then we combine that result with the format expression to get the string output you want.

a.with_columns(
    pl.format("new_value_{}", pl.col("a").cum_sum())
)

shape: (8, 2)
┌─────┬─────────────┐
│ a   ┆ literal     │
│ --- ┆ ---         │
│ i64 ┆ str         │
╞═════╪═════════════╡
│ 1   ┆ new_value_1 │
│ 0   ┆ new_value_1 │
│ 0   ┆ new_value_1 │
│ 0   ┆ new_value_1 │
│ 1   ┆ new_value_2 │
│ 0   ┆ new_value_2 │
│ 0   ┆ new_value_2 │
│ 1   ┆ new_value_3 │
└─────┴─────────────┘