I have a dataframe with three columns as Year, Product, Price. I wanted to calculate minimum value excluding zero from Price from each year. Also wanted to populate adjacent value from column Product to the minimum value.
Data:
Year Product Price
2000 Grapes 0
2000 Apple 220
2000 pear 185
2000 Watermelon 172
2001 Orange 0
2001 Muskmelon 90
2001 Pear 165
2001 Watermelon 99
Desirable output in new dataframe:
Year Minimum Price Product
2000 172 Watermelon
2001 90 Muskmelon
First filter out 0
rows by boolean indexing
:
df1 = df[df['Price'] != 0]
And then use DataFrameGroupBy.idxmin
for indices for minimal Price
per groups with selecting by loc
:
df2 = df1.loc[df1.groupby('Year')['Price'].idxmin()]
Alternative is use sort_values
with drop_duplicates
:
df2 = df1.sort_values(['Year', 'Price']).drop_duplicates('Year')
print (df2)
Year Product Price
3 2000 Watermelon 172
5 2001 Muskmelon 90
If possible multiple minimal values and need all of them per groups:
print (df)
Year Product Price
0 2000 Grapes 0
1 2000 Apple 220
2 2000 pear 172
3 2000 Watermelon 172
4 2001 Orange 0
5 2001 Muskmelon 90
6 2001 Pear 165
7 2001 Watermelon 99
df1 = df[df['Price'] != 0]
df = df1[df1['Price'].eq(df1.groupby('Year')['Price'].transform('min'))]
print (df)
Year Product Price
2 2000 pear 172
3 2000 Watermelon 172
5 2001 Muskmelon 90
EDIT:
print (df)
Year Product Price
0 2000 Grapes 0
1 2000 Apple 220
2 2000 pear 185
3 2000 Watermelon 172
4 2001 Orange 0
5 2001 Muskmelon 90
6 2002 Pear 0
7 2002 Watermelon 0
df['Price'] = df['Price'].replace(0, np.nan)
df2 = df.sort_values(['Year', 'Price']).drop_duplicates('Year')
df2['Product'] = df2['Product'].mask(df2['Price'].isnull(), 'No data')
print (df2)
Year Product Price
3 2000 Watermelon 172.0
5 2001 Muskmelon 90.0
6 2002 No data NaN