How to select a row with minimum value over some other column

How to select a rows from dataframe based on minimum value in one column aggregated over another column. For example

a = pl.DataFrame({'a': [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3], 'b': [10, 5, 10, 6, 5, 4, 5, 30, 25, 24, 30, 30], 'c': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]})
print(a)
-----------------
shape: (12, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 10  ┆ 1   │
│ 1   ┆ 5   ┆ 2   │
│ 1   ┆ 10  ┆ 3   │
│ 2   ┆ 6   ┆ 4   │
│ 2   ┆ 5   ┆ 5   │
│ 2   ┆ 4   ┆ 6   │
│ 2   ┆ 5   ┆ 7   │
│ 3   ┆ 30  ┆ 8   │
│ 3   ┆ 25  ┆ 9   │
│ 3   ┆ 24  ┆ 10  │
│ 3   ┆ 30  ┆ 11  │
│ 3   ┆ 30  ┆ 12  │
└─────┴─────┴─────┘

I need the following dataframe

shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 5   ┆ 2   │
│ 2   ┆ 4   ┆ 6   │
│ 3   ┆ 24  ┆ 10  │
└─────┴─────┴─────┘

so it's like groupby(by='a'), them min('b'), but I need to select entire row with minimum b.

Thanks.

Solution

You could group_by and .sort_by() + .first()

(df.group_by("a", maintain_order=True)
   .agg(pl.all().sort_by("b").first())
)

shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 5   ┆ 2   │
│ 2   ┆ 4   ┆ 6   │
│ 3   ┆ 24  ┆ 10  │
└─────┴─────┴─────┘