For example, I have a DataFrame in Python, naming people, with people_name, fav_color, best_friend_name. How should my query be so that I can add a new column that is the favorite color of their best friend, ie best_friend_fav_color?
My data is like:
{'Name' : ['John', 'Leo', 'Mary'],
'fav_color' : ['Red', 'Blue', 'Green'],
'Best_friend_name' : ['Leo', 'Mary', 'John']}
and the expected output would be like:
{'Name' : ['John', 'Leo', 'Mary'],
'fav_color' : ['Red', 'Blue', 'Green'],
'Best_friend_name' : ['Leo', 'Mary', 'John'],
'Best_friend_fav_color' = ['Blue', 'Green', 'Red']}
And we can assume that the best friends will always be in the data frame.
This is fairly easy to do:
First copy your data to make two DataFrames:
df2 = df[['Name', 'fav_color']].copy()
Next, join the two DataFrames on the person's name and best friend's name:
res = df.merge(df2, how='left', left_on='Best_friend_name', right_on='Name')
res
will contain the output you are looking for. It's columns will require renaming. You will notice that you will have fav_color_x
and fav_color_y
. To rename, just assign the correct names to res.columns
, like so:
res.columns = ['Name', 'fav_color', 'Best_friend_name', 'Name2', 'Best_friend_fav_color']
You notice the two Name
columns? You can simply drop 'Name2'.