If I have a single column, I can sort that column within groups using the over
method. For example,
import polars as pl
df = pl.DataFrame({'group': [2,2,1,1,2,2], 'value': [3,4,3,1,1,3]})
df.with_columns(pl.col('value').sort().over('group'))
# shape: (6, 2)
# ┌───────┬───────┐
# │ group ┆ value │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═══════╪═══════╡
# │ 2 ┆ 1 │
# │ 2 ┆ 3 │
# │ 1 ┆ 1 │
# │ 1 ┆ 3 │
# │ 2 ┆ 3 │
# │ 2 ┆ 4 │
# └───────┴───────┘
What is nice about operation is that it maintains the order of the groups (e.g. group=1 is still rows 3 and 4; group=2 is still rows 1, 2, 5, and 6).
But this only works to sort a single column. How do sort an entire table like this? I tried these things below, but none of them worked:
import polars as pl
df = pl.DataFrame({'group': [2,2,1,1,2,2], 'value': [3,4,3,1,1,3], 'value2': [5,4,3,2,1,0]})
df.group_by('group').sort('value', 'value2')
# AttributeError: 'GroupBy' object has no attribute 'sort'
df.sort(pl.col('value').over('group'), pl.col('value2').over('group'))
# does not sort with groups
# Looking for this:
# shape: (6, 3)
# ┌───────┬───────┬────────┐
# │ group ┆ value ┆ value2 │
# │ --- ┆ --- ┆ --- │
# │ i64 ┆ i64 ┆ i64 │
# ╞═══════╪═══════╪════════╡
# │ 2 ┆ 1 ┆ 1 │
# │ 2 ┆ 3 ┆ 0 │
# │ 1 ┆ 1 ┆ 2 │
# │ 1 ┆ 3 ┆ 3 │
# │ 2 ┆ 3 ┆ 5 │
# │ 2 ┆ 4 ┆ 4 │
# └───────┴───────┴────────┘
The solution to sorting an entire table in a grouped situation is pl.all().sort_by(sort_columns).over(group_columns)
.
import polars as pl
df = pl.DataFrame({
'group': [2,2,1,1,2,2],
'value': [3,4,3,1,1,3],
'value2': [5,4,3,2,1,0],
})
df.select(pl.all().sort_by(['value','value2']).over('group'))