I was hoping to get help with the following:
I have a given df below of:
df
fruit State Count
apples CA 45
apples VT 54
apples MI 18
pears TX 20
pears AZ 89
plums NV 62
plums ID 10
I took all the highest counts for each fruit per state, and was able to get something back like:
df2
fruit State Count
apples VT 54
pears AZ 89
plums NV 62
Now I am trying to figure out how to get the 'State' values from df2 as a new column in df to look like something like this:
df
fruit State Count Main
apples CA 45 VT
apples VT 54 VT
apples MI 18 VT
pears TX 20 AZ
pears AZ 89 AZ
plums NV 62 NV
plums ID 10 NV
I can do something similar with the .transform() function, but i only know how to do that while calling the max function. Could i run transform on a df['list']? Or am i missing something else here?
Use GroupBy.transform
by DataFrameGroupBy.idxmax
, but first need set_index
for indices by column State
:
df['new'] = df.set_index('State').groupby('fruit')['Count'].transform('idxmax').values
print (df)
fruit State Count new
0 apples CA 45 VT
1 apples VT 54 VT
2 apples MI 18 VT
3 pears TX 20 AZ
4 pears AZ 89 AZ
5 plums NV 62 NV
6 plums ID 10 NV
Another solution with sort_values
, drop_duplicates
and set_index
for map
by this Series
:
s = (df.sort_values('Count', ascending= False)
.drop_duplicates('fruit')
.set_index('fruit')['State'])
print (s)
fruit
pears AZ
plums NV
apples VT
Name: State, dtype: object
df['new'] = df['fruit'].map(s)
print (df)
fruit State Count new
0 apples CA 45 VT
1 apples VT 54 VT
2 apples MI 18 VT
3 pears TX 20 AZ
4 pears AZ 89 AZ
5 plums NV 62 NV
6 plums ID 10 NV