I have a dataframe that looks like
Group_ID probability
34883 0.002676
34883 0.17826266
34883 0.01399753
34883 0.04569782
34883 0.02799506
34883 0.02634829
34883 0.02923014
34883 0.13544669
34883 0.07595718
34883 0.19246604
34883 0.20028818
34883 0
34883 0
34883 0.07163442
34897 0.03329843
34897 0.07643979
34897 0.09570681
34897 0.00376963
34897 0.01780105
34897 0.0008377
34897 0.08125654
34897 0.10764398
34897 0.25780105
34897 0.10910995
34897 0
34897 0.02743455
34897 0.18890052
34897 0
where the probabilities in probability
sum to 1 for each Group_ID
. I want to create a new column called top 3
which indicates the position of the largest 3 probabilities for each Group_ID and top3 = 1 if the row has the greatest probability for that Group_ID and zero otherwise. Hence the outcome looks like:
Group_ID probability top3
34883 0.002676 0
34883 0.17826266 1
34883 0.01399753 0
34883 0.04569782 0
34883 0.02799506 0
34883 0.02634829 0
34883 0.02923014 0
34883 0.13544669 0
34883 0.07595718 0
34883 0.19246604 1
34883 0.20028818 1
34883 0 0
34883 0 0
34883 0.07163442 0
34897 0.03329843 0
34897 0.07643979 0
34897 0.09570681 0
34897 0.00376963 0
34897 0.01780105 0
34897 0.0008377 0
34897 0.08125654 0
34897 0.10764398 0
34897 0.25780105 1
34897 0.10910995 1
34897 0 0
34897 0.02743455 0
34897 0.18890052 1
34897 0 0
I googled for a bit and I think it might have something to do with idxmax but I am not sure how to proceed. Thank you in advance.
Use a groupy.rank
in descending order, then keep the values ≤ (le
) 3, finally convert the boolean Series to integer with astype
:
df['top3'] = (df.groupby('Group_ID')['probability']
.rank('dense', ascending=False).le(3).astype(int)
)
Output:
Group_ID probability top3
0 34883 0.002676 0
1 34883 0.178263 1
2 34883 0.013998 0
3 34883 0.045698 0
4 34883 0.027995 0
5 34883 0.026348 0
6 34883 0.029230 0
7 34883 0.135447 0
8 34883 0.075957 0
9 34883 0.192466 1
10 34883 0.200288 1
11 34883 0.000000 0
12 34883 0.000000 0
13 34883 0.071634 0
14 34897 0.033298 0
15 34897 0.076440 0
16 34897 0.095707 0
17 34897 0.003770 0
18 34897 0.017801 0
19 34897 0.000838 0
20 34897 0.081257 0
21 34897 0.107644 0
22 34897 0.257801 1
23 34897 0.109110 1
24 34897 0.000000 0
25 34897 0.027435 0
26 34897 0.188901 1
27 34897 0.000000 0