I'm trying to find out which occupation has the max mean salary.
I've tried
df.groupby('Occupation').agg({'Salary':'mean'})
I think I've figured out how to get the max mean salary but I can't figure out how to get the specific occupation title. Any tips ? Thank you!!
When you perform a groupby
, the features you use in your groupby become the index. Since the result of your groupby will be a Series (as you are only aggregating mean salary), you can use idxmax
to then retrieve the index where the max salary occurs. However, if there are multiple occupations that share the same max salary, this will only return one of those occupations.
df = pd.DataFrame({'Occupation':list('aaabbbccc'),'Salary':[1,2,3,4,5,6,7,8,9]})
occupation_max_salary = df.groupby('Occupation').agg({'Salary':'mean'}).idxmax()[0]
occupation_max_salary
is 'c'
as expected.
So if you need to account for possible ties in mean salary, then you can try the following:
df2 = pd.DataFrame({'Occupation':list('aaabbbccc'),'Salary':[1,2,3,7,8,9,7,8,9]})
salaries = df.groupby('Occupation').agg({'Salary':'mean'})
occupation_max_salary = salaries[salaries == salaries.max()].dropna().index.tolist()
In this case, occupation_max_salary
is ['b','c']