Consider a sample dateframe
df = pd.DataFrame({'group' : [1, 2, 2], 'x' : [1, 2, 3], 'y' : [2, 3, np.nan]})
If I want to get the max value of variable 'y' without skipping NANs, I would use the function:
df.y.max(skipna = False)
The returned results is nan as expected
However, if I want to calculate the grouped max value by 'group', as follows:
df.groupby('group').y.max(skipna = False)
I got an error message: TypeError: max() got an unexpected keyword argument 'skipna'
Seems like the DataFrameGroupBy.max() does not have the argument to skip nas. What would be the best way to get the desired result?
You can try to apply pd.Series.max
:
x = df.groupby("group")["y"].apply(pd.Series.max, skipna=False)
print(x)
Prints:
group
1 2.0
2 NaN
Name: y, dtype: float64