I have the following dataframe:
df1 = pd.DataFrame({'Parent': ['Stay home', "Stay home","Stay home", 'Go outside', "Go Outside","Go outside"],
'Child' : ['Severe weather', "Severe weather", "Severe weather", 'Sunny', "Sunny", "sunny"]})
Parent Child
0 Stay home Severe weather
1 Stay home Severe weather
2 Stay home Severe weather
3 Go outside Sunny
4 Go Outside Sunny
5 Go outside sunny
And a second one:
df2 = pd.DataFrame({'Similarity_Score': ['SimilarityScore:0.43693185876069784', 'SimilarityScore:0.299807821163373']})
Similarity_Score
0 SimilarityScore:0.43693185876069784
1 SimilarityScore:0.299807821163373
I want to merge the two dataframes based on the values of Child values of df1.
Expected outcome:
Parent Child Similarity_Score
0 Stay home Severe weather 0.43693185876069784
1 Stay home Severe weather 0.43693185876069784
2 Stay home Severe weather 0.43693185876069784
3 Go outside Sunny 0.299807821163373
4 Go Outside Sunny 0.299807821163373
5 Go outside sunny 0.299807821163373
I tried the usual merge
and concat
methods but could not find the solution. Any ideas?
If you want to assign the scores based on the Child's value, you can do it like that:
import numpy as np
import pandas as pd
df1 = pd.DataFrame({'Parent': ['Stay home', "Stay home", "Stay home", 'Go outside', "Go Outside", "Go outside"],
'Child': ['Severe weather', "Severe weather", "Severe weather", 'Sunny', "Sunny", "Sunny"]})
df2 = pd.DataFrame({'Similarity_Score': ['SimilarityScore:0.43693185876069784', 'SimilarityScore:0.299807821163373']})
# Split the string at : and convert to float
df2['Similarity_Score'] = df2['Similarity_Score'].str.split(':').str[1].astype(float)
# calculate auxiliary column position to base the matching on
df1['position'] = df1['Child'].apply(lambda row: np.where(df1['Child'].unique() == row)[0][0])
# merge both dataframes and drop auxiliary column position
df = df1.merge(df2, left_on='position', right_index=True).drop(columns=["position"])