import polars as pl
df = pl.DataFrame({
"Letter": ["A", "A", "B", "B", "B", "C", "C", "D", "D", "E"],
"Value': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
})
I want to group Letter
and collect their corresponding Value
in a List.
Related Pandas question: How to group dataframe rows into list in pandas groupby
I know pandas code will not work here:
df.group_by("a")["b"].apply(list)
TypeError: 'GroupBy' object is not subscriptable
Output will be:
┌────────┬───────────┐
│ Letter ┆ Value │
│ --- ┆ --- │
│ str ┆ list[i64] │
╞════════╪═══════════╡
│ A ┆ [1, 2] │
│ B ┆ [3, 4, 5] │
│ C ┆ [6, 7] │
│ D ┆ [8, 9] │
│ E ┆ [10] │
└────────┴───────────┘
You could do this.
import polars as pl
df = pl.DataFrame(
{
'Letter': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'D','D','E'],
'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9,10]
}
)
g = df.group_by('Letter', maintain_order=True).agg(pl.col('Value'))
print(g)
This will print
┌────────┬───────────┐
│ Letter ┆ Value │
│ --- ┆ --- │
│ str ┆ list[i64] │
╞════════╪═══════════╡
│ A ┆ [1, 2] │
│ B ┆ [3, 4, 5] │
│ C ┆ [6, 7] │
│ D ┆ [8, 9] │
│ E ┆ [10] │
└────────┴───────────┘
maintain_order=True
is required if you want to order of the groups to be consistent with the input data.