Rank within group in Polars?

I have a Polars dataframe like so:

c1	c2	c3
a	a	1
a	a	1
a	b	1
a	c	1
d	a	1
d	b	1

I am trying to assign a number to each group of (c2, c3) within c1, so that would look like this:

c1	c2	c3	rank
a	a	1	0
a	a	1	0
a	b	1	1
a	c	1	2
d	a	1	0
d	b	1	1

How do I accomplish this?

I see how to do a global ranking:

df.join(
    df.select(["c1", "c2", "c3"])
    .unique()
    .with_columns(rank=pl.int_range(1, pl.len() + 1),
    on=["c1", "c2", "c3"]
)

but that is a global ranking, not one within the c1 group. I also wonder if it possible to do this with over() instead of the groupby/join pattern.

Solution

Create a struct of columns c2, c3 using pl.struct("c2", "c3"), compute the dense rank over c1, and then subtract 1 because the ranks start from 1 by default:

pl.struct("c2", "c3").rank("dense").over("c1") - 1

Full code:

import polars as pl

df = pl.DataFrame(
    {
        "c1": ["a", "a", "a", "a", "d", "d"],
        "c2": ["a", "a", "b", "c", "a", "b"],
        "c3": [1, 1, 1, 1, 1, 1],
    }
)

df2 = df.with_columns(rank=pl.struct("c2", "c3").rank("dense").over("c1") - 1)

print(df2)

Output:

┌─────┬─────┬─────┬──────┐
│ c1  ┆ c2  ┆ c3  ┆ rank │
│ --- ┆ --- ┆ --- ┆ ---  │
│ str ┆ str ┆ i64 ┆ u32  │
╞═════╪═════╪═════╪══════╡
│ a   ┆ a   ┆ 1   ┆ 0    │
│ a   ┆ a   ┆ 1   ┆ 0    │
│ a   ┆ b   ┆ 1   ┆ 1    │
│ a   ┆ c   ┆ 1   ┆ 2    │
│ d   ┆ a   ┆ 1   ┆ 0    │
│ d   ┆ b   ┆ 1   ┆ 1    │
└─────┴─────┴─────┴──────┘