I have the following dataframe:
df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
'Parrot', 'Parrot'],
'Habitat':['Jungle', 'Jungle',
'Sky', 'Sky'],
'Tmp':['A', 'B', 'C', 'D'],
'Max Speed': [380., 370., 24., 26.]})
>>> df
Animal Habitat Tmp Max Speed
0 Falcon Jungle A 380.0
1 Falcon Jungle B 370.0
2 Parrot Sky C 24.0
3 Parrot Sky D 26.0
I am trying to add additional column "Output" which will take the value from "Tmp" based on maximum value of column "Max Speed" in a groupby done of columns "Animal" and "Habitat".
Desired output:
Animal Habitat Tmp Max Speed Output
0 Falcon Jungle A 380.0 A
1 Falcon Jungle B 370.0 A
2 Parrot Sky C 24.0 D
3 Parrot Sky D 26.0 D
It can be done using a groupby
and then joining it in the original dataset. But is there a more efficient way to do this? Maybe using transform
or something else?
You can define a function taking pd.dataframe
as argument:
import pandas as pd
import numpy as np
def fmax(df_):
df_['Output'] = df_.sort_values(['Max Speed']).tail(1)['Tmp'].squeeze()
return df_
Please note use of pandas.DataFrame.squeeze
function to return scalar value.
Then simply apply
above function using groupby
:
df.groupby(['Animal','Habitat']).apply(fmax)
The result is:
Animal Habitat Tmp Max Speed Output
0 Falcon Jungle A 380.0 A
1 Falcon Jungle B 370.0 A
2 Parrot Sky C 24.0 D
3 Parrot Sky D 26.0 D