Here is a piece of code for Polars library along with some test data:
import polars as pl
data = [0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0]
df = pl.DataFrame({"test": data})
I want to achieve the following result:
┌──────┬────────┐
│ test ┆ result │
│ --- ┆ --- │
│ i64 ┆ u32 │
╞══════╪════════╡
│ 0 ┆ 0 │
│ 1 ┆ 1 │
│ 1 ┆ 2 │
│ 0 ┆ 0 │
│ 1 ┆ 1 │
│ 1 ┆ 2 │
│ 1 ┆ 3 │
│ 0 ┆ 0 │
│ 0 ┆ 0 │
│ 1 ┆ 1 │
│ 0 ┆ 0 │
│ 1 ┆ 1 │
│ 1 ┆ 2 │
│ 1 ┆ 3 │
│ 1 ┆ 4 │
│ 0 ┆ 0 │
└──────┴────────┘
The desired result is to keep the original 0 values unchanged, start accumulating the consecutive 1s, and reset the count to the initial value when encountering a 0 value.
The amount of data is so large that a syntax similar to a for loop cannot be used.
What should I do if I am required to use only polars functions and not numpy or other libraries?
One way to break up the output data is that it is a cumulative count by group, with a new group starting every time a 0 appears in the input data. In that way you can build the following expression:
df.with_columns(
pl.col("test")
.cum_count()
.over(pl.when(pl.col("test") == 0).then(1).cum_sum().forward_fill()) - 1
)
The cum_sum
in the over
expression on a flat 1
literal column, along with filling the null
s appropriately, creates the groups we need.