python python-3.x pandas dataframe group-by

New column based on the largest 3 elements in a group indexed by Group ID

I have a dataframe that looks like

Group_ID    probability 
34883       0.002676    
34883       0.17826266  
34883       0.01399753  
34883       0.04569782  
34883       0.02799506  
34883       0.02634829  
34883       0.02923014  
34883       0.13544669  
34883       0.07595718  
34883       0.19246604  
34883       0.20028818  
34883       0           
34883       0           
34883       0.07163442  
34897       0.03329843  
34897       0.07643979  
34897       0.09570681  
34897       0.00376963  
34897       0.01780105  
34897       0.0008377   
34897       0.08125654  
34897       0.10764398  
34897       0.25780105  
34897       0.10910995  
34897       0           
34897       0.02743455  
34897       0.18890052  
34897       0

where the probabilities in probability sum to 1 for each Group_ID. I want to create a new column called top 3 which indicates the position of the largest 3 probabilities for each Group_ID and top3 = 1 if the row has the greatest probability for that Group_ID and zero otherwise. Hence the outcome looks like:

Group_ID    probability top3
34883       0.002676    0
34883       0.17826266  1
34883       0.01399753  0
34883       0.04569782  0
34883       0.02799506  0
34883       0.02634829  0
34883       0.02923014  0
34883       0.13544669  0
34883       0.07595718  0
34883       0.19246604  1
34883       0.20028818  1
34883       0           0
34883       0           0
34883       0.07163442  0
34897       0.03329843  0
34897       0.07643979  0
34897       0.09570681  0
34897       0.00376963  0
34897       0.01780105  0
34897       0.0008377   0
34897       0.08125654  0
34897       0.10764398  0
34897       0.25780105  1
34897       0.10910995  1
34897       0           0
34897       0.02743455  0
34897       0.18890052  1
34897       0           0

I googled for a bit and I think it might have something to do with idxmax but I am not sure how to proceed. Thank you in advance.

Solution

Use a groupy.rank in descending order, then keep the values ≤ (le) 3, finally convert the boolean Series to integer with astype:

df['top3'] = (df.groupby('Group_ID')['probability']
                .rank('dense', ascending=False).le(3).astype(int)
             )

Output:

    Group_ID  probability  top3
0      34883     0.002676     0
1      34883     0.178263     1
2      34883     0.013998     0
3      34883     0.045698     0
4      34883     0.027995     0
5      34883     0.026348     0
6      34883     0.029230     0
7      34883     0.135447     0
8      34883     0.075957     0
9      34883     0.192466     1
10     34883     0.200288     1
11     34883     0.000000     0
12     34883     0.000000     0
13     34883     0.071634     0
14     34897     0.033298     0
15     34897     0.076440     0
16     34897     0.095707     0
17     34897     0.003770     0
18     34897     0.017801     0
19     34897     0.000838     0
20     34897     0.081257     0
21     34897     0.107644     0
22     34897     0.257801     1
23     34897     0.109110     1
24     34897     0.000000     0
25     34897     0.027435     0
26     34897     0.188901     1
27     34897     0.000000     0