Search code examples
pythonsortinggroup-by

Python: Sorting by specific columns and automatically sorting other columns


I have slice of my dataframe like this and I want to sort values by person and buy date, then all other columns are sorted ascending automatically.

df = pd.DataFrame({'person': [1, 1, 1, 2, 3, 3],
                   'brand': ['b', 'a', 'a', 'b', 'a', 'b'],
                   'product': ['A1', 'A1', 'A2', 'B1', 'C1', 'C2'],
                   'buydate': ['20220101', '20220101', '20220401', '20220601', '20220630', '20221201'],
                   'type': ['2', '1', '1', '1', '1', '2'],
                   'price': [50, 20, 200, 300, 20, 150],})

and I expected:

person brand product buydate type price
0 1 a A1 20220101 1 20
1 1 b A1 20220101 2 50
2 1 a A2 20220401 1 200
3 2 b B1 20220601 1 300
4 3 a C1 20220630 1 20
5 3 b C2 20221201 2 150

I have tried

df = df.sort_values(['person', 'buydate'])

but I only got:

person brand product buydate type price
0 1 b A1 20220101 2 50
1 1 a A1 20220101 1 20
2 1 a A2 20220401 1 200
3 2 b B1 20220601 1 300
4 3 a C1 20220630 1 20
5 3 b C2 20221201 2 150

My dataset is pretty big and contains a lot of columns, so it is highly appreciated if any solutions with short runtime. While using lambda is required too much time, any solutions using lambda is also welcomed!

Thank you so much!


Solution

  • You can craft the sorting list using Index.difference:

    cols = ['person', 'brand']
    
    out = df.sort_values(by=cols+list(df.columns.difference(cols)))
    

    If you want to use a custom order with a default of ascending for the other columns:

    cols = ['person', 'brand']
    
    out = df.sort_values(by=cols+(l:=list(df.columns.difference(cols))),
                         ascending=[True, True]+[True]*len(l)
                        )
    
    # or for python <3.8 that doesn't support the walrus operator
    # cols = ['person', 'brand']
    # order = [True, True]
    # default_order = True
    # out = df.sort_values(by=cols+list(df.columns.difference(cols)),
    #                      ascending=order+[default_order]*len(set(df)-set(cols))
    #                     )
    

    Output:

       person brand product   buydate type  price
    1       1     a      A1  20220101    1     20
    2       1     a      A2  20220401    1    200
    0       1     b      A1  20220101    2     50
    3       2     b      B1  20220601    1    300
    4       3     a      C1  20220630    1     20
    5       3     b      C2  20221201    2    150