python pandas dataframe sorting group-by

How to sort and group together the elements in a pandas dataframe in the way I want?

For instance, I create a pandas dataframe:

data = [['nick', 3, '2023/11/22 10:05:00', 'A'], ['tom', 3, '2023/11/22 9:25:00','A'], ['juli', 2, '2023/11/22 12:05:00', 'B'], ['ewa', 4, '2023/11/22 9:55:00', 'B'],['peter', 5, '2023/11/22 11:00:00', 'A'], ['johan', 5, '2023/11/22 9:00:00', 'C']]
col_name = ['name', 'score', 'time', 'category'] 
df = df = pd.DataFrame(data, columns=col_name)

There are six students from different groups, and they have different scores in an exam. I want to group together the students from the same group and then sort them in the group from the highest score to lowest. Then I want to put the group with the highest score in the beginning of the dataframe followed by the group with lower score. If two students have the same score, the one with earlier time should be put first.

I tried the code:

df_sorted=df.sort_values(['group','score'],ascending=False).groupby('group').apply(lambda x: x)

It groups together the students and sorts them in the same group. The new dataframe I have now:

It only partly achieved what I need to do. Peter has a higher score than Eva and Juli, so Peter, Nick and Tom should be put right under Johan, followed by Ewa and Juli. Tom should be put in front of Nick since the time is earlier.

What should I do next to achieve the goal?

Solution

You can add an extra column with groupby.transform to add a group score (here max), then use this to sort the groups:

out = (df
   .assign(max=df.groupby('category')['score'].transform('max'),
           dt=pd.to_datetime(df['time'])
          )
   .sort_values(by=['max', 'category', 'score', 'dt'],
                ascending=[False, True, False, True]
               )
   .drop(columns=['max', 'dt'])
)

Same logic without the temporary columns with numpy.lexsort:

import numpy as np

order = np.lexsort([pd.to_datetime(df['time']),
                    -df['score'],
                    df['category'],
                    -df.groupby('category')['score'].transform('max')])

out = df.iloc[order]

Output:

    name  score                 time category
4  peter      5  2023/11/22 11:00:00        A
1    tom      3   2023/11/22 9:25:00        A
0   nick      3  2023/11/22 10:05:00        A
5  johan      5   2023/11/22 9:00:00        C
3    ewa      4   2023/11/22 9:55:00        B
2   juli      2  2023/11/22 12:05:00        B

breaking a tie between two groups with the same max

Here groups A and C have the same maximum, if you want to have C on top because it has a greater average score than A, you can add an extra sorter:

g = df.groupby('category')['score'].transform

out = (df
   .assign(max=g('max'),
           mean=g('mean'),
           dt=pd.to_datetime(df['time'])
          )
   .sort_values(by=['max', 'mean', 'category', 'score', 'dt'],
                ascending=[False, False, True, False, True]
               )
   .drop(columns=['max', 'mean', 'dt'])
)

Or:

import numpy as np

g = df.groupby('category')['score'].transform
order = np.lexsort([pd.to_datetime(df['time']),
                    -df['score'],
                    df['category'],
                    -g('mean'),
                    -g('max')])

out = df.iloc[order]

Output:

    name  score                 time category
5  johan      5   2023/11/22 9:00:00        C
4  peter      5  2023/11/22 11:00:00        A
1    tom      3   2023/11/22 9:25:00        A
0   nick      3  2023/11/22 10:05:00        A
3    ewa      4   2023/11/22 9:55:00        B
2   juli      2  2023/11/22 12:05:00        B