Say I had a csv file with 3 columns, 'name', 'price' and 'color'.
How could I go about getting the variable of the name of say, the most expensive blue item, the most expensive red, and the most expensive yellow?
Would really appreciate any help :)
Our plan is to find the class we want (say, "blue" items) and, then, find the most expensive (the maximum in the price
column).
Let's define an example DataFrame:
import pandas as pd
df = pd.DataFrame({
'name': [a for a in "abcdef"],
'price': [1.5, 3.8, 1.4, 5.9, 3.5, 1.9],
'color': ['blue', 'red', 'yellow', 'blue', 'red', 'yellow']
}).set_index('name')
And here our DataFrame:
price color
name
a 1.5 blue
b 3.8 red
c 1.4 yellow
d 5.9 blue
e 3.5 red
f 1.9 yellow
To do the first part (find the items of a specific color), we can use Pandas' query
. So the following will select blue items and save to blue_items
.
blue_items = df[df.color == "blue"] # selects the df's slice in which df.color is equals to "blue".
Then we can get the index of the maximum price (as I've defined name
to be the index column, it'll return the name):
blue_items["price"].idxmax()
The complete code (now considering you're importing a CSV file):
import pandas as pd
df = pd.read_csv("filename.csv", index_col="name")
most_exp_blue = df[df.color == "blue"]["price"].idxmax() # the most expensive blue
most_exp_red = df[df.color == "red"]["price"].idxmax() # the most expensive red
most_exp_yellow = df[df.color == "yellow"]["price"].idxmax() # the most expensive yellow