Search code examples
pythonpython-polars

Add row count per group in polars


Is there a way to rewrite this:

import numpy
import polars

df = (polars
  .DataFrame(dict(
    j=numpy.random.randint(10, 99, 20),
    ))
  .with_row_index()
  .select(
    g=polars.col('index') // 3,
    j='j'
    )
  .with_columns(rn=1)
  .with_columns(
    rn=polars.col('rn').shift().fill_null(0).cum_sum().over('g')
    )
  )
print(df)

 g (u32)  j (i64)  rn (i32)
 0        47       0
 0        22       1
 0        82       2
 1        19       0
 1        85       1
 1        15       2
 2        89       0
 2        74       1
 2        26       2
 3        11       0
 3        86       1
 3        81       2
 4        16       0
 4        35       1
 4        60       2
 5        30       0
 5        28       1
 5        94       2
 6        21       0
 6        38       1
shape: (20, 3)

so it adds rn column without requiring it to add a column full of 1s first? I.e. somehow rewrite this part:

  .with_columns(rn=1)
  .with_columns(
    rn=polars.col('rn').shift().fill_null(0).cum_sum().over('g')
    )

so that:

  .with_columns(rn=1)

is not required? Basically reduce two expressions to one.

Or any other / better way to add a row count per group?


Solution

  • It can be done by generating an .int_range() using the length of each group.

    df.with_columns(rn = pl.int_range(pl.len()).over("g"))
    
    shape: (20, 3)
    ┌─────┬─────┬─────┐
    │ g   ┆ j   ┆ rn  │
    │ --- ┆ --- ┆ --- │
    │ u32 ┆ i64 ┆ i64 │
    ╞═════╪═════╪═════╡
    │ 0   ┆ 14  ┆ 0   │ # group_len = 3, range = [0, 1, 2]
    │ 0   ┆ 81  ┆ 1   │
    │ 0   ┆ 72  ┆ 2   │
    │ 1   ┆ 34  ┆ 0   │
    │ 1   ┆ 90  ┆ 1   │
    │ …   ┆ …   ┆ …   │
    │ 5   ┆ 26  ┆ 0   │
    │ 5   ┆ 44  ┆ 1   │
    │ 5   ┆ 27  ┆ 2   │
    │ 6   ┆ 70  ┆ 0   │
    │ 6   ┆ 86  ┆ 1   │
    └─────┴─────┴─────┘