Search code examples
pythonpython-polars

How to get the group_by keys with a loop?


Update: polars.dataframe.group_by.GroupBy has since been made iterable.

for i, g in df.group_by('id', 'sid'):
    ...

I need to do some somewhat complicated processing for each group after grouping. in pandas, it can be writed as follows:

for i,g in df.groupby(['id','sid']):
    pass

While in polars, the groups function returns a DataFrame, But this cannot be conveniently applied to for loops.


Solution

  • You could use partition by. This would yield a dictionary where the group_by keys map to the partitioned DataFrames.

    df = pl.DataFrame({
        "groups": [1, 1, 2, 2, 2],
        "values": pl.int_range(5, eager=True)
    })
    
    part_dfs = df.partition_by("groups", as_dict=True)
    
    print(part_dfs)
    
    {(1,): shape: (2, 2)
    ┌────────┬────────┐
    │ groups ┆ values │
    │ ---    ┆ ---    │
    │ i64    ┆ i64    │
    ╞════════╪════════╡
    │ 1      ┆ 0      │
    │ 1      ┆ 1      │
    └────────┴────────┘, 
    (2,): shape: (3, 2)
    ┌────────┬────────┐
    │ groups ┆ values │
    │ ---    ┆ ---    │
    │ i64    ┆ i64    │
    ╞════════╪════════╡
    │ 2      ┆ 2      │
    │ 2      ┆ 3      │
    │ 2      ┆ 4      │
    └────────┴────────┘}
    

    note: the resulting keys are given as a tuple