I'm trying to create a column, that changes its value for every 1/True in target column, and keeps previous value for 0/False. So for example how to get from this
a = pl.DataFrame({'a': [1, 0, 0, 0, 1, 0, 0, 1]})
shape: (8, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 0 │
│ 0 │
│ 0 │
│ 1 │
│ 0 │
│ 0 │
│ 1 │
└─────┘
to this dataframe
shape: (8, 2)
┌─────┬────────────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪════════════╡
│ 1 ┆ new_value1 │
│ 0 ┆ new_value1 │
│ 0 ┆ new_value1 │
│ 0 ┆ new_value1 │
│ 1 ┆ new_value2 │
│ 0 ┆ new_value2 │
│ 0 ┆ new_value2 │
│ 1 ┆ new_value3 │
└─────┴────────────┘
In polars, fold
, reduce
, cum_fold
and cum_reduce
are horizontal expressions. Meaning that they operate op columns, not on elements.
To achieve what you want, you can use cum_sum
to get a monotonically increasing integer on every True
value.
Then we combine that result with the format
expression to get the string output you want.
a.with_columns(
pl.format("new_value_{}", pl.col("a").cum_sum())
)
shape: (8, 2)
┌─────┬─────────────┐
│ a ┆ literal │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════════════╡
│ 1 ┆ new_value_1 │
│ 0 ┆ new_value_1 │
│ 0 ┆ new_value_1 │
│ 0 ┆ new_value_1 │
│ 1 ┆ new_value_2 │
│ 0 ┆ new_value_2 │
│ 0 ┆ new_value_2 │
│ 1 ┆ new_value_3 │
└─────┴─────────────┘