Search code examples
python-polars

Number of rows within each window


I have some expressions that I will evaluate later either within or without a window function. This normally works fine. Have pl.col("x").max()—add .over("y") later. Have pl.arange(0, pl.count())—add .over("y") later. One expression this does not work on is pl.count().

If you try to window pl.count(), Polars errors:

import polars as pl

df = pl.DataFrame(dict(x=[1,1,0,0], y=[1,2,3,4]))
expression = pl.count()

df.with_columns([expression.over("x").alias("z")])
# exceptions.ComputeError: Cannot apply a window function, did not find a root column. This is likely due to a syntax error in this expression: count()

Is there a version of count that can handle being windowed? I know that I can do pl.col("x").count().over("x"), but then I have to know ahead of time what columns will exist, and the expressions and the window columns come from completely different parts of my code.


Solution

  • Upgrade to Polars >=0.14. Starting in that release, the behavior in the original question started working without modification.

    import polars as pl
    
    df = pl.DataFrame(dict(x=[1,1,0,0], y=[1,2,3,4]))
    expression = pl.count()
    
    df.with_columns([expression.over("x").alias("z")])
    # shape: (4, 3)
    # ┌─────┬─────┬─────┐
    # │ x   ┆ y   ┆ z   │
    # │ --- ┆ --- ┆ --- │
    # │ i64 ┆ i64 ┆ u32 │
    # ╞═════╪═════╪═════╡
    # │ 1   ┆ 1   ┆ 2   │
    # │ 1   ┆ 2   ┆ 2   │
    # │ 0   ┆ 3   ┆ 2   │
    # │ 0   ┆ 4   ┆ 2   │
    # └─────┴─────┴─────┘