Search code examples
python-polars

Python Polars: Number of Rows since last value within a group


Given a polars DataFrame
data = pl.DataFrame({"user_id": [1, 1, 1, 2, 2, 2], "login": [False, True, False, False, False, True]})

How could I add a column which adds the number of rows since the user last logged in, with any rows before a login for that user being set to None? Example output for the above data is
[None, 0, 1, None, None, 0]

I tried adapting the answers from here, but cannot get it working with groups


Solution

  • You could subtract the row number of the previous login.

    (df.with_row_index()
       .with_columns(distance = 
          pl.col.index - pl.when("login").then("index").forward_fill().over("user_id")
       )
    )
    
    shape: (6, 4)
    ┌───────┬─────────┬───────┬──────────┐
    │ index ┆ user_id ┆ login ┆ distance │
    │ ---   ┆ ---     ┆ ---   ┆ ---      │
    │ u32   ┆ i64     ┆ bool  ┆ u32      │
    ╞═══════╪═════════╪═══════╪══════════╡
    │ 0     ┆ 1       ┆ false ┆ null     │
    │ 1     ┆ 1       ┆ true  ┆ 0        │
    │ 2     ┆ 1       ┆ false ┆ 1        │
    │ 3     ┆ 2       ┆ false ┆ null     │
    │ 4     ┆ 2       ┆ false ┆ null     │
    │ 5     ┆ 2       ┆ true  ┆ 0        │
    └───────┴─────────┴───────┴──────────┘
    

    If the data is unsorted, you can use .int_range() as the row number instead.

    df.with_columns(distance =
       pl.int_range(pl.len()).over("user_id") 
       - pl.when("login").then(pl.int_range(pl.len())).forward_fill().over("user_id")
    )
    

    Details

    (df.with_row_index()
       .with_columns(
          new_index  = pl.when("login").then("index"),
          last_index = pl.when("login").then("index").forward_fill(),
          last_index_groupwise = pl.when("login").then("index").forward_fill().over("user_id")
       )
    )
    
    shape: (6, 6)
    ┌───────┬─────────┬───────┬───────────┬────────────┬──────────────────────┐
    │ index ┆ user_id ┆ login ┆ new_index ┆ last_index ┆ last_index_groupwise │
    │ ---   ┆ ---     ┆ ---   ┆ ---       ┆ ---        ┆ ---                  │
    │ u32   ┆ i64     ┆ bool  ┆ u32       ┆ u32        ┆ u32                  │
    ╞═══════╪═════════╪═══════╪═══════════╪════════════╪══════════════════════╡
    │ 0     ┆ 1       ┆ false ┆ null      ┆ null       ┆ null                 │
    │ 1     ┆ 1       ┆ true  ┆ 1         ┆ 1          ┆ 1                    │
    │ 2     ┆ 1       ┆ false ┆ null      ┆ 1          ┆ 1                    │
    │ 3     ┆ 2       ┆ false ┆ null      ┆ 1          ┆ null                 │
    │ 4     ┆ 2       ┆ false ┆ null      ┆ 1          ┆ null                 │
    │ 5     ┆ 2       ┆ true  ┆ 5         ┆ 5          ┆ 5                    │
    └───────┴─────────┴───────┴───────────┴────────────┴──────────────────────┘