Search code examples
python-polars

Count consecutive True (or 1) values in a Boolean (or numeric) column with Polars?


I am hoping to count consecutive values in a column, preferably using Polars expressions.

import polars
df = pl.DataFrame(
   {"value": [True,True,True,False,False,True,False,False,True,True]}
)

With the example data frame above, I would like to count the number of consecutive True values.

Below is an example using R's Data.Table package.

library(data.table)
dt <- data.table(value = c(T,T,T,F,F,T,F,F,T,T))
dt[, value2 := fifelse((1:.N) == .N & value == 1, .N, NA_integer_), by = rleid(value)]
dt

Desired result displayed as a Polars DataFrame

shape: (10, 2)
┌───────┬──────┐
│ value ┆ len  │
│ ---   ┆ ---  │
│ bool  ┆ u32  │
╞═══════╪══════╡
│ true  ┆ null │
│ true  ┆ null │
│ true  ┆ 3    │
│ false ┆ null │
│ false ┆ null │
│ true  ┆ 1    │
│ false ┆ null │
│ false ┆ null │
│ true  ┆ null │
│ true  ┆ 2    │
└───────┴──────┘

Any ideas how this would be done efficiently using Polars?


Solution

  • Polars has an rle_id() method similar to the R example.

    We can use that as our "group id", and get the pl.len() of each group with .over()

    df.with_columns(
       pl.len().over(pl.col("value").rle_id())
    )
    
    shape: (10, 2)
    ┌───────┬─────┐
    │ value ┆ len │
    │ ---   ┆ --- │
    │ bool  ┆ u32 │
    ╞═══════╪═════╡
    │ true  ┆ 3   │
    │ true  ┆ 3   │
    │ true  ┆ 3   │
    │ false ┆ 2   │
    │ false ┆ 2   │
    │ true  ┆ 1   │
    │ false ┆ 2   │
    │ false ┆ 2   │
    │ true  ┆ 2   │
    │ true  ┆ 2   │
    └───────┴─────┘
    

    If you want to null out the other rows, one way could be .is_last_distinct() inside a .when().then()

    df.with_columns(
       pl.when(pl.col("value") & pl.col("value").is_last_distinct())
         .then(pl.len())
         .over(pl.col("value").rle_id())
    )
    
    shape: (10, 2)
    ┌───────┬──────┐
    │ value ┆ len  │
    │ ---   ┆ ---  │
    │ bool  ┆ u32  │
    ╞═══════╪══════╡
    │ true  ┆ null │
    │ true  ┆ null │
    │ true  ┆ 3    │
    │ false ┆ null │
    │ false ┆ null │
    │ true  ┆ 1    │
    │ false ┆ null │
    │ false ┆ null │
    │ true  ┆ null │
    │ true  ┆ 2    │
    └───────┴──────┘