Search code examples
python-polars

How to use the Polars library in Python to find consecutive 1s?


Here is a piece of code for Polars library along with some test data:

import polars as pl

data = [0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0]

df = pl.DataFrame({"test": data})

I want to achieve the following result:

┌──────┬────────┐
│ test ┆ result │
│ ---  ┆ ---    │
│ i64  ┆ u32    │
╞══════╪════════╡
│ 0    ┆ 0      │
│ 1    ┆ 1      │
│ 1    ┆ 2      │
│ 0    ┆ 0      │
│ 1    ┆ 1      │
│ 1    ┆ 2      │
│ 1    ┆ 3      │
│ 0    ┆ 0      │
│ 0    ┆ 0      │
│ 1    ┆ 1      │
│ 0    ┆ 0      │
│ 1    ┆ 1      │
│ 1    ┆ 2      │
│ 1    ┆ 3      │
│ 1    ┆ 4      │
│ 0    ┆ 0      │
└──────┴────────┘

The desired result is to keep the original 0 values unchanged, start accumulating the consecutive 1s, and reset the count to the initial value when encountering a 0 value.

The amount of data is so large that a syntax similar to a for loop cannot be used.

What should I do if I am required to use only polars functions and not numpy or other libraries?


Solution

  • One way to break up the output data is that it is a cumulative count by group, with a new group starting every time a 0 appears in the input data. In that way you can build the following expression:

    df.with_columns(
        pl.col("test")
        .cum_count()
        .over(pl.when(pl.col("test") == 0).then(1).cum_sum().forward_fill()) - 1
    )
    

    The cum_sum in the over expression on a flat 1 literal column, along with filling the nulls appropriately, creates the groups we need.