I have two Pandas DataFrames, one contains data I want to update and the other provides a lookup based on a MultiIndex key to set a value.
As an example, I have two csv's:
fruit.csv
Fruit,Color,State,more,data
Apple,Red,Good,etc.,etc.
Apple,Green,Mouldy,etc.,etc.
Apple,Green,Excellent,etc.,etc.
Pear,Red,Excellent,etc.,etc.
Pear,Green,Good,etc.,etc.
Lime,Green,Bad,etc.,etc.
rating.csv
Fruit,State,Rating
Apple,Excellent,11
Apple,Good,8
Apple,Bad,4
Apple,Mouldy,0
Pear,Excellent,9
Pear,Good,5
Pear,Bad,2
Pear,Mouldy,1
Lime,Excellent,10
Lime,Good,7
Lime,Bad,5
Lime,Mouldy,2
Which I've read into DataFrames:
static_data_dir = Path(__file__).resolve().parent
fruit = pd.read_csv(static_data_dir.joinpath("fruit.csv"), index_col=["Fruit","Color"])
rating = pd.read_csv(static_data_dir.joinpath("rating.csv"), index_col=["Fruit","State"])
State more data
Fruit Color
Apple Red Good etc. etc.
Green Mouldy etc. etc.
Green Excellent etc. etc.
Pear Red Excellent etc. etc.
Green Good etc. etc.
Lime Green Bad etc. etc.
Rating
Fruit State
Apple Excellent 11
Good 8
Bad 4
Mouldy 0
Pear Excellent 9
Good 5
Bad 2
Mouldy 1
Lime Excellent 10
Good 7
Bad 5
Mouldy 2
And now would like to replace the State value in the fruit DataFrame with the Rating value from the rating DataFrame, ending up with the below.
State more data
Fruit Color
Apple Red 8 etc. etc.
Green 0 etc. etc.
Green 11 etc. etc.
Pear Red 9 etc. etc.
Green 5 etc. etc.
Lime Green 5 etc. etc.
Effectively I want to use pandas.Series.replace
but pass in a dict with a tuple key, but that doesn't appear to be supported.
{'Rating': {('Apple', 'Bad'): 4,
('Apple', 'Excellent'): 11,
('Apple', 'Good'): 8,
('Apple', 'Mouldy'): 0,
('Lime', 'Bad'): 5,
('Lime', 'Excellent'): 10,
('Lime', 'Good'): 7,
('Lime', 'Mouldy'): 2,
('Pear', 'Bad'): 2,
('Pear', 'Excellent'): 9,
('Pear', 'Good'): 5,
('Pear', 'Mouldy'): 1}}
How best would I go about achieving this?
Read two csv as normal dataframe, then merge on Fruit
and State
columns using keys from fruit
dataframe by setting how="left"
. At last set Fruit
and Color
column as index.
import pandas as pd
fruit = pd.read_csv("fruit.csv")
rating = pd.read_csv("rating.csv")
fruit['State'] = fruit.merge(rating, on=["Fruit", "State"], how="left")["Rating"]
fruit.set_index(["Fruit","Color"], inplace=True)
print(fruit)
State more data
Fruit Color
Apple Red 8 etc. etc.
Green 0 etc. etc.
Green 11 etc. etc.
Pear Red 9 etc. etc.
Green 5 etc. etc.
Lime Green 5 etc. etc.