Search code examples
pythondataframepython-polars

Python Polars Window Function With Literal Type


Say I have a DataFrame with an id column like this:

df = pl.from_repr("""
┌─────┐
│ id  │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 1   │
│ 1   │
│ 2   │
│ 2   │
│ 3   │
│ 3   │
└─────┘
""")

I want to aggregate a running count over the id column, giving this result:

┌─────┬───────┐
│ id  ┆ count │
│ --- ┆ ---   │
│ i64 ┆ i64   │
╞═════╪═══════╡
│ 1   ┆ 1     │
│ 1   ┆ 2     │
│ 1   ┆ 3     │
│ 2   ┆ 1     │
│ 2   ┆ 2     │
│ 3   ┆ 1     │
│ 3   ┆ 2     │
└─────┴───────┘

My attempt involved creating a dummy column, which I think produced the desired result but seems a bit hacky.

(
    df.with_columns(
        pl.lit(1).alias("ones")
    )
    .with_columns(
        (pl.col("ones").cum_sum().over("id")).alias("count")
    )
    .drop("ones")
)

However when I try this:

(
    df.with_columns(
        (pl.lit(1).cum_sum().over("id")).alias("count")
    )
    .drop("ones")
)

I get the error "ComputeError: the length of the window expression did not match that of the group".

Is there a better way to do this? What am I missing in my attempt above?


Solution

  • cum_count seems to do the job with a 1-shift adjustment (or any shift you want, of course).

    There is also int_range:

    df.with_columns(
       count = pl.col('id').cum_count().over('id'),
       count2 = pl.int_range(pl.len()).over('id') + 1
    )
    
    shape: (7, 3)
    ┌─────┬───────┬────────┐
    │ id  ┆ count ┆ count2 │
    │ --- ┆ ---   ┆ ---    │
    │ i64 ┆ u32   ┆ i64    │
    ╞═════╪═══════╪════════╡
    │ 1   ┆ 1     ┆ 1      │
    │ 1   ┆ 2     ┆ 2      │
    │ 1   ┆ 3     ┆ 3      │
    │ 2   ┆ 1     ┆ 1      │
    │ 2   ┆ 2     ┆ 2      │
    │ 3   ┆ 1     ┆ 1      │
    │ 3   ┆ 2     ┆ 2      │
    └─────┴───────┴────────┘