Search code examples
pythonpandassortingpandas-groupbycolumnsorting

How to sort a group in a way that I get the largest number in the first row and smallest in the second and the second largest in the third and so on


So I have a df like this

In [1]:data= {'Group': ['A','A','A','A','A','A','B','B','B','B'],
    'Name': [ ' Sheldon Webb',' Traci Dean',' Chad Webster',' Ora Harmon',' Elijah Mendoza',' June Strickland',' Beth Vasquez',' Betty Sutton',' Joel Gill',' Vernon Stone'],
    'Performance':[33,64,142,116,122,68,95,127,132,80]}
In [2]:df = pd.DataFrame(data, columns = ['Group', 'Name','Performance'])

Out[1]:
    Group  Name             Performance
0    A     Sheldon Webb       33
1    A     Traci Dean         64
2    A     Chad Webster      142
3    A     Ora Harmon        116
4    A     Elijah Mendoza    122
5    A     June Strickland    68
6    B     Beth Vasquez       95
7    B     Betty Sutton      127
8    B     Joel Gill         132
9    B     Vernon Stone       80

I want to sort it in such an alternating way that within a group, say group "A", the first row should have its highest performing person (in this case "Chad Webster") and then in the second row the least performing (which is "Sheldon Webb").

The output I am looking for would look something like this:

Out[2]:
    Group   Name             Performance
0    A     Chad Webster      142
1    A     Sheldon Webb       33
2    A     Elijah Mendoza    122
3    A     Traci Dean         64
4    A     Ora Harmon        116
5    A     June Strickland    68
6    B     Joel Gill         132
7    B     Vernon Stone       80
8    B     Betty Sutton      127
9    B     Beth Vasquez       95

You can see the sequence is alternating between the highest and lowest within a group.


Solution

  • Take the sorted order and then apply a quadratic function to it where the root is 1/2 the length of the array (plus some small offset). This way the highest rank is given to the extremal values (the sign of the eps offset determines whether you want a the highest value ranked above the lowest value). I added a small group at the end to show how it properly handles repeated values or an odd group size.

    def extremal_rank(s):
        eps = 10**-4
        y = (pd.Series(np.arange(1, len(s)+1), index=s.sort_values().index) 
             - (len(s)+1)/2 + eps)**2
        return y.reindex_like(s)
        
    df['rnk'] = df.groupby('Group')['Performance'].apply(extremal_rank)
    df = df.sort_values(['Group', 'rnk'], ascending=[True, False])
    

       Group              Name  Performance     rnk
    2      A      Chad Webster          142  6.2505
    0      A      Sheldon Webb           33  6.2495
    4      A    Elijah Mendoza          122  2.2503
    1      A        Traci Dean           64  2.2497
    3      A        Ora Harmon          116  0.2501
    5      A   June Strickland           68  0.2499
    8      B         Joel Gill          132  2.2503
    9      B      Vernon Stone           80  2.2497
    7      B      Betty Sutton          127  0.2501
    6      B      Beth Vasquez           95  0.2499
    11     C                 b          110  9.0006
    12     C                 c           68  8.9994
    10     C                 a          110  4.0004
    13     C                 d           68  3.9996
    15     C                 f           70  1.0002
    16     C                 g           70  0.9998
    14     C                 e           70  0.0000