Search code examples
pythonpandasgroup-by

Obtaining grouped max() or min() in Pandas without skipping NANs


Consider a sample dateframe

df = pd.DataFrame({'group' : [1, 2, 2], 'x' : [1, 2, 3], 'y' : [2, 3, np.nan]})

If I want to get the max value of variable 'y' without skipping NANs, I would use the function:

df.y.max(skipna = False)

The returned results is nan as expected

However, if I want to calculate the grouped max value by 'group', as follows:

df.groupby('group').y.max(skipna = False)

I got an error message: TypeError: max() got an unexpected keyword argument 'skipna'

Seems like the DataFrameGroupBy.max() does not have the argument to skip nas. What would be the best way to get the desired result?


Solution

  • You can try to apply pd.Series.max:

    x = df.groupby("group")["y"].apply(pd.Series.max, skipna=False)
    print(x)
    

    Prints:

    group
    1    2.0
    2    NaN
    Name: y, dtype: float64