I have a Pandas dataframe that looks like
Race_ID Date Student_ID Rank
1 1/1/2023 1 3
1 1/1/2023 2 8
1 1/1/2023 3 0
1 1/1/2023 4 4
2 11/9/2022 1 2
2 11/9/2022 2 3
2 11/9/2022 3 9
3 17/4/2022 5 0
3 17/4/2022 2 1
3 17/4/2022 3 2
3 17/4/2022 4 5
4 1/3/2022 1 6
4 1/3/2022 2 2
5 1/1/2021 1 0
5 1/1/2021 2 3
5 1/1/2021 3 1
And I want to add a new column min>0
which is the minimum value of Rank
groupby Race_ID
greater than 0, so the desired outcome looks like
Race_ID Date Student_ID Rank min>0
1 1/1/2023 1 3 3
1 1/1/2023 2 8 3
1 1/1/2023 3 0 3
1 1/1/2023 4 4 3
2 11/9/2022 1 2 2
2 11/9/2022 2 3 2
2 11/9/2022 3 9 2
3 17/4/2022 5 0 1
3 17/4/2022 2 1 1
3 17/4/2022 3 2 1
3 17/4/2022 4 5 1
4 1/3/2022 1 6 2
4 1/3/2022 2 2 2
5 1/1/2021 1 0 1
5 1/1/2021 2 3 1
5 1/1/2021 3 1 1
I know groupby
and transform('min')
but I don't know how to include the condition >0.
Filter/mask the values and compute the min excluding the unwanted values.
You have several options.
Pre-masking, then using groupby.transform
:
df['min>0'] = (df['Rank'].where(df['Rank']>0)
.groupby(df['Race_ID']).transform('min')
.convert_dtypes()
)
Using a custom transform
function:
df['min>0'] = (df.groupby('Race_ID')['Rank']
.transform(lambda x: x[x>0].min())
)
Or with a filter, an aggregation, then map
to broadcast back to the original shape:
df['min>0'] = df['Race_ID'].map(df[df['Rank'].gt(0)]
.groupby('Race_ID')['Rank'].min())
Output:
Race_ID Date Student_ID Rank min>0
0 1 1/1/2023 1 3 3
1 1 1/1/2023 2 8 3
2 1 1/1/2023 3 0 3
3 1 1/1/2023 4 4 3
4 2 11/9/2022 1 2 2
5 2 11/9/2022 2 3 2
6 2 11/9/2022 3 9 2
7 3 17/4/2022 5 0 1
8 3 17/4/2022 2 1 1
9 3 17/4/2022 3 2 1
10 3 17/4/2022 4 5 1
11 4 1/3/2022 1 6 2
12 4 1/3/2022 2 2 2
13 5 1/1/2021 1 0 1
14 5 1/1/2021 2 3 1
15 5 1/1/2021 3 1 1