Search code examples
pythonpython-polars

How to group dataframe rows into list in polars group_by


import polars as pl

df = pl.DataFrame(
    {
        'Letter': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'D','D','E'],
        'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9,10]
    }
)

I want to group Letter and collect their corresponding Value in a List.

Related Pandas question: How to group dataframe rows into list in pandas groupby

I know pandas code will not work here:

df.group_by('a')['b'].apply(list)

TypeError: 'GroupBy' object is not subscriptable

Output will be:

| A      ┆ [1, 2]    │
│ B      ┆ [3, 4, 5] │
│ C      ┆ [6, 7]    │
│ D      ┆ [8, 9]    │
│ E      ┆ [10]      |

Solution

  • You could do this. maintain_order=True is required if you want to order of the groups to be consistent with the input data.

    import polars as pl
    
    df = pl.DataFrame(
        {
            'Letter': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'D','D','E'],
            'Value': [1, 2, 3, 4, 5, 6, 7, 8, 9,10]
        }
    )
    g = df.group_by('Letter', maintain_order=True).agg(pl.col('Value'))
    print(g)
    

    This will print

    ┌────────┬───────────┐
    │ Letter ┆ Value     │
    │ ---    ┆ ---       │
    │ str    ┆ list[i64] │
    ╞════════╪═══════════╡
    │ A      ┆ [1, 2]    │
    │ B      ┆ [3, 4, 5] │
    │ C      ┆ [6, 7]    │
    │ D      ┆ [8, 9]    │
    │ E      ┆ [10]      │
    └────────┴───────────┘