Search code examples
pythonpandasdataframegroup-by

Pandas groupby transform minimum greater than 0


I have a Pandas dataframe that looks like

Race_ID   Date           Student_ID      Rank  
1         1/1/2023       1               3     
1         1/1/2023       2               8     
1         1/1/2023       3               0     
1         1/1/2023       4               4     
2         11/9/2022      1               2     
2         11/9/2022      2               3     
2         11/9/2022      3               9     
3         17/4/2022      5               0     
3         17/4/2022      2               1     
3         17/4/2022      3               2     
3         17/4/2022      4               5     
4         1/3/2022       1               6     
4         1/3/2022       2               2     
5         1/1/2021       1               0     
5         1/1/2021       2               3     
5         1/1/2021       3               1     

And I want to add a new column min>0 which is the minimum value of Rank groupby Race_ID greater than 0, so the desired outcome looks like

Race_ID   Date           Student_ID      Rank  min>0  
1         1/1/2023       1               3     3
1         1/1/2023       2               8     3
1         1/1/2023       3               0     3
1         1/1/2023       4               4     3
2         11/9/2022      1               2     2
2         11/9/2022      2               3     2
2         11/9/2022      3               9     2
3         17/4/2022      5               0     1
3         17/4/2022      2               1     1
3         17/4/2022      3               2     1
3         17/4/2022      4               5     1
4         1/3/2022       1               6     2
4         1/3/2022       2               2     2
5         1/1/2021       1               0     1
5         1/1/2021       2               3     1
5         1/1/2021       3               1     1

I know groupby and transform('min') but I don't know how to include the condition >0.


Solution

  • Filter/mask the values and compute the min excluding the unwanted values.

    You have several options.

    Pre-masking, then using groupby.transform:

    df['min>0'] = (df['Rank'].where(df['Rank']>0)
                   .groupby(df['Race_ID']).transform('min')
                   .convert_dtypes()
                  )
    

    Using a custom transform function:

    df['min>0'] = (df.groupby('Race_ID')['Rank']
                     .transform(lambda x: x[x>0].min())
                  )
    

    Or with a filter, an aggregation, then map to broadcast back to the original shape:

    df['min>0'] = df['Race_ID'].map(df[df['Rank'].gt(0)]
                                    .groupby('Race_ID')['Rank'].min())
    

    Output:

        Race_ID       Date  Student_ID  Rank  min>0
    0         1   1/1/2023           1     3      3
    1         1   1/1/2023           2     8      3
    2         1   1/1/2023           3     0      3
    3         1   1/1/2023           4     4      3
    4         2  11/9/2022           1     2      2
    5         2  11/9/2022           2     3      2
    6         2  11/9/2022           3     9      2
    7         3  17/4/2022           5     0      1
    8         3  17/4/2022           2     1      1
    9         3  17/4/2022           3     2      1
    10        3  17/4/2022           4     5      1
    11        4   1/3/2022           1     6      2
    12        4   1/3/2022           2     2      2
    13        5   1/1/2021           1     0      1
    14        5   1/1/2021           2     3      1
    15        5   1/1/2021           3     1      1