Update: polars.dataframe.group_by.GroupBy
has since been made iterable.
for i, g in df.group_by('id', 'sid'):
...
I need to do some somewhat complicated processing for each group after grouping.
in pandas
, it can be writed as follows:
for i,g in df.groupby(['id','sid']):
pass
While in polars, the groups
function returns a DataFrame, But this cannot be conveniently applied to for loops.
You could use partition by. This would yield a dictionary
where the group_by
keys map to the partitioned DataFrames
.
df = pl.DataFrame({
"groups": [1, 1, 2, 2, 2],
"values": pl.int_range(5, eager=True)
})
part_dfs = df.partition_by("groups", as_dict=True)
print(part_dfs)
{(1,): shape: (2, 2)
┌────────┬────────┐
│ groups ┆ values │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞════════╪════════╡
│ 1 ┆ 0 │
│ 1 ┆ 1 │
└────────┴────────┘,
(2,): shape: (3, 2)
┌────────┬────────┐
│ groups ┆ values │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞════════╪════════╡
│ 2 ┆ 2 │
│ 2 ┆ 3 │
│ 2 ┆ 4 │
└────────┴────────┘}
note: the resulting keys are given as a tuple