Search code examples
python-polars

Sort within groups on entire table


If I have a single column, I can sort that column within groups using the over method. For example,

import polars as pl

df = pl.DataFrame({'group': [2,2,1,1,2,2], 'value': [3,4,3,1,1,3]})
 
df.with_columns(pl.col('value').sort().over('group'))
# shape: (6, 2)
# ┌───────┬───────┐
# │ group ┆ value │
# │ ---   ┆ ---   │
# │ i64   ┆ i64   │
# ╞═══════╪═══════╡
# │ 2     ┆ 1     │
# │ 2     ┆ 3     │
# │ 1     ┆ 1     │
# │ 1     ┆ 3     │
# │ 2     ┆ 3     │
# │ 2     ┆ 4     │
# └───────┴───────┘

What is nice about operation is that it maintains the order of the groups (e.g. group=1 is still rows 3 and 4; group=2 is still rows 1, 2, 5, and 6).

But this only works to sort a single column. How do sort an entire table like this? I tried these things below, but none of them worked:

import polars as pl

df = pl.DataFrame({'group': [2,2,1,1,2,2], 'value': [3,4,3,1,1,3], 'value2': [5,4,3,2,1,0]})

df.group_by('group').sort('value', 'value2')
# AttributeError: 'GroupBy' object has no attribute 'sort'

df.sort(pl.col('value').over('group'), pl.col('value2').over('group'))
# does not sort with groups

# Looking for this:
# shape: (6, 3)
# ┌───────┬───────┬────────┐
# │ group ┆ value ┆ value2 │
# │ ---   ┆ ---   ┆ ---    │
# │ i64   ┆ i64   ┆ i64    │
# ╞═══════╪═══════╪════════╡
# │ 2     ┆ 1     ┆ 1      │
# │ 2     ┆ 3     ┆ 0      │
# │ 1     ┆ 1     ┆ 2      │
# │ 1     ┆ 3     ┆ 3      │
# │ 2     ┆ 3     ┆ 5      │
# │ 2     ┆ 4     ┆ 4      │
# └───────┴───────┴────────┘

Solution

  • The solution to sorting an entire table in a grouped situation is pl.all().sort_by(sort_columns).over(group_columns).

    import polars as pl
    
    df = pl.DataFrame({
      'group': [2,2,1,1,2,2],
      'value': [3,4,3,1,1,3],
      'value2': [5,4,3,2,1,0],
    })
    
    df.select(pl.all().sort_by(['value','value2']).over('group'))