Search code examples
python-polars

Rank within group in Polars?


I have a Polars dataframe like so:

c1 c2 c3
a a 1
a a 1
a b 1
a c 1
d a 1
d b 1

I am trying to assign a number to each group of (c2, c3) within c1, so that would look like this:

c1 c2 c3 rank
a a 1 0
a a 1 0
a b 1 1
a c 1 2
d a 1 0
d b 1 1

How do I accomplish this?

I see how to do a global ranking:

df.join(
    df.select(["c1", "c2", "c3"])
    .unique()
    .with_columns(rank=pl.int_range(1, pl.len() + 1),
    on=["c1", "c2", "c3"]
)

but that is a global ranking, not one within the c1 group. I also wonder if it possible to do this with over() instead of the groupby/join pattern.


Solution

  • Create a struct of columns c2, c3 using pl.struct("c2", "c3"), compute the dense rank over c1, and then subtract 1 because the ranks start from 1 by default:

    pl.struct("c2", "c3").rank("dense").over("c1") - 1
    

    Full code:

    import polars as pl
    
    df = pl.DataFrame(
        {
            "c1": ["a", "a", "a", "a", "d", "d"],
            "c2": ["a", "a", "b", "c", "a", "b"],
            "c3": [1, 1, 1, 1, 1, 1],
        }
    )
    
    df2 = df.with_columns(rank=pl.struct("c2", "c3").rank("dense").over("c1") - 1)
    
    print(df2)
    

    Output:

    ┌─────┬─────┬─────┬──────┐
    │ c1  ┆ c2  ┆ c3  ┆ rank │
    │ --- ┆ --- ┆ --- ┆ ---  │
    │ str ┆ str ┆ i64 ┆ u32  │
    ╞═════╪═════╪═════╪══════╡
    │ a   ┆ a   ┆ 1   ┆ 0    │
    │ a   ┆ a   ┆ 1   ┆ 0    │
    │ a   ┆ b   ┆ 1   ┆ 1    │
    │ a   ┆ c   ┆ 1   ┆ 2    │
    │ d   ┆ a   ┆ 1   ┆ 0    │
    │ d   ┆ b   ┆ 1   ┆ 1    │
    └─────┴─────┴─────┴──────┘