Search code examples
pythonpandasdataframesorting

How to sort a dataframe based on multi-column and drop rows that do not sort


For example, if I had a dataframe that looks like

A B
1 8
4 9
2 5
6 2

Is there a easier way to sort the two rows where Col-A is in ascending order and B in descending order and any row that B cannot be ordered in descending be dropped? The second row should be dropped because 9 is greater than 5.

I am expecting a table that looks like

A B
1 8
2 5
6 2

Solution

  • You can sort_values, then compute a cummin, drop the rows that violate the cummin:

    tmp = df.sort_values(['A','B'], ascending=[True, False])
    
    m = tmp['B'].le(tmp['B'].cummin())
    
    out = tmp[m]
    
    # or to keep the original DataFrame
    out = df.loc[m[m].index]
    

    Output:

       A  B
    0  1  8
    2  2  5
    3  6  2