Search code examples
pythonpandasgroup-bymaxmin

How to find the smallest maximum of a column with pandas after filtering?


I have a dataframe:

import pandas as pd
df = pd.DataFrame(
    {'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
     'variable': [8, 9, 10, 11, 2, 3, 4, 5],
     'another_variable': [1, 1, 1, 2, 1, 1, 2, 2]}
)

I would like to find the largest value of variable (which is counting upwards) where another_variable is still equal to 1.

I can group the data frame and filter the relevant rows:

df.groupby(['team']).apply(lambda g: g[g['another_variable'] == 1])

# Output:
#       team    variable    another_variable
#team               
#A  0   A       8           1
#   1   A       9           1
#   2   A       10          1
#B  4   B       2           1
#   5   B       3           1

But if I add .variable.min(), I only get a single value, instead of one value for each group (which I then could calculate the maximum of). What am I doing wrong?


Solution

  • Filter first, then groupby:

    df[df['another_variable'].eq(1)].groupby('team')['variable'].max()
    

    Output:

    team
    A    10
    B     3
    Name: variable, dtype: int64
    

    If there is a possibility that a group has no 1 and you'd like to have NaN, then use:

    df['variable'].where(df['another_variable'].eq(1)).groupby(df['team']).max()
    

    Example if there was no 1 in A:

    team
    A   NaN
    B     3
    Name: variable, dtype: int64