How to get the group_by keys with a loop?

Update: polars.dataframe.group_by.GroupBy has since been made iterable.

for i, g in df.group_by('id', 'sid'):
    ...

I need to do some somewhat complicated processing for each group after grouping. in pandas, it can be writed as follows:

for i,g in df.groupby(['id','sid']):
    pass

While in polars, the groups function returns a DataFrame, But this cannot be conveniently applied to for loops.

Solution

You could use partition by. This would yield a dictionary where the group_by keys map to the partitioned DataFrames.

df = pl.DataFrame({
    "groups": [1, 1, 2, 2, 2],
    "values": pl.int_range(5, eager=True)
})

part_dfs = df.partition_by("groups", as_dict=True)

print(part_dfs)

{(1,): shape: (2, 2)
┌────────┬────────┐
│ groups ┆ values │
│ ---    ┆ ---    │
│ i64    ┆ i64    │
╞════════╪════════╡
│ 1      ┆ 0      │
│ 1      ┆ 1      │
└────────┴────────┘, 
(2,): shape: (3, 2)
┌────────┬────────┐
│ groups ┆ values │
│ ---    ┆ ---    │
│ i64    ┆ i64    │
╞════════╪════════╡
│ 2      ┆ 2      │
│ 2      ┆ 3      │
│ 2      ┆ 4      │
└────────┴────────┘}

note: the resulting keys are given as a tuple