I'm having trouble to find a solution to this issue where I have 2 dataframes, df1 and df2.
df1
C1 C2
0 apple 5
1 grape 11
2 orange 7
3 cherry 9
df2
Item
0 apple soda
1 cherry cola
2 grape juice
3 apple candy
4 bubblegum apple
5 orange citrus
6 funky grapefruit
7 sweet banana
How do I search the strings in df2 and check if a substring exists in df1 which then returns the value from df1 into a new column in df2?
Here's the result I'm trying to achieve:
df2
Item C2
0 apple soda 5
1 cherry cola 9
2 grape juice 11
3 apple candy 5
4 bubblegum apple 5
5 orange citrus 7
6 funky grapefruit 11
7 sweet banana Null
Here is solution:
import pandas as pd
data1 = {'C1': ['apple', 'grape', 'orange', 'cherry'],
'C2': [5, 11, 7, 9]}
data2 = {'Item': ['apple soda', 'cherry cola', 'grape juice', 'apple candy',
'bubblegum apple', 'orange citrus', 'funky grapefruit', 'sweet banana']}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
def find_matching_value(row):
matching_rows = df1[df1['C1'].apply(lambda x: x in row['Item'])]
if not matching_rows.empty:
return matching_rows['C2'].iloc[0]
return None
df2['C2'] = df2.apply(find_matching_value, axis=1)
print(df2)
Output:
Item C2
0 apple soda 5.0
1 cherry cola 9.0
2 grape juice 11.0
3 apple candy 5.0
4 bubblegum apple 5.0
5 orange citrus 7.0
6 funky grapefruit 11.0
7 sweet banana NaN