Search code examples
pandasdataframewindowsql-order-bypartition

Order DataFrame By Group of Category as Window


However I found several similar questions/answers, none demonstrated a "simple" manner to order a pandas dataframe by groups of categories and minimum value like window partitions. The image represents pretty well the goal.

data = {
    'Section': ['A', 'C', 'C', 'B', 'A', 'D', 'B', 'D'],
    'Content': [3, 5, 2, 3, 1, 1, 0, 2]
}

df_initial = pd.DataFrame(data)


data = {
    'Section': ['A', 'B', 'C', 'D', 'A', 'B', 'C', 'D'],
    'Content': [1, 0, 2, 1, 3, 3, 5, 2]
}

df_sorted = pd.DataFrame(data)

This example was the nearest one I found by keeping the whole group together

Ranking order per group in Pandas

Cheers

enter image description here


Solution

  • Try this:

    df['sortkey'] = df.sort_values(['Content']).groupby('Section')['Content'].cumcount()
    df.sort_values(['sortkey', 'Section'])
    

    Output:

      Section  Content  sortkey
    4       A        1        0
    6       B        0        0
    2       C        2        0
    5       D        1        0
    0       A        3        1
    3       B        3        1
    1       C        5        1
    7       D        2        1
    

    Details:

    • Sort by Content first to order
    • Groupby section and use cumcount to get order in each section
    • Lastly, sort the dataframe first my the new created sortkey and section.
    • You can drop sortkey add .drop('sortkey', axis=1)