I have two python dataframes: one of them have a column has a row 'AC-2' another data frame includes columns which has a string row 'AC-20,AC8,AC-15'
str_match = "({})".format("|".join(df1['column_first_dataframe']))
df2.merge(df2,how='left',left_on=df1['column_first_dataframe'].str.extract(str_match)[0], right_on="column_second_dataframe")
First dataset:
Column1 Column2
AC-2 2
AC-20 1
AC-15 3
AC-1 2
AC-5 5
second dataset:
Column1
AC-2,AC-5,AC-30
AC-20,AC-30,AC11
I found:
Column1 Column2
AC-2 AC-20,AC-30,AC11
AC-2 AC-2,AC-5,AC-30
AC-20 AC-20,AC-30,AC11
AC-15 null
AC-1 null
AC-5 AC-2,AC-5,AC-30
above there is a matching between AC-2 for dataset1 and AC-20 in string for dataset 2 but they are different thing in my dataset. my desired output:
Column1 Column2
AC-2 AC-2,AC-5,AC-30
AC-20 AC-20,AC-30,AC11
AC-15 null
AC-1 null
AC-5 AC-2,AC-5,AC-30
I tried to merge data frames with searching and matching BUT my code merged AC-2 to AC-20. I don't want to do that. Are there any searching and matching way to merge my data frames exactly what I want. I don't want to merge AC-20 and AC-2. It should merge AC-2 and AC-2 with exactly same thing.
Thank you for your contributions!
A simple way is to split
and explode
the Column2 of df2 to get one row per full word and perform a simple left merge
.
(df1
.merge(df2.assign(group=df2['Column1'].str.split(','))
.rename(columns={'Column1': 'Column3'})
.explode('group'),
left_on='Column1', right_on='group', how='left'
)
.drop(columns='group')
)
output:
Column1 Column2 Column3
0 AC-2 2 AC-2,AC-5,AC-30
1 AC-20 1 AC-20,AC-30,AC11
2 AC-15 3 NaN
3 AC-1 2 NaN
4 AC-5 5 AC-2,AC-5,AC-30
NB. I am producing above a slightly different output to keep all columns. If really you want your output, simply drop "Column2" in df1 and rename "Column1" into "Column2" in df2
(df1
.drop(columns='Column2')
.merge(df2.rename(columns={'Column1': 'Column2'})
.assign(Column1=df2['Column1'].str.split(','))
.explode('Column1'),
on='Column1', how='left'
)
)
output:
Column1 Column2
0 AC-2 AC-2,AC-5,AC-30
1 AC-20 AC-20,AC-30,AC11
2 AC-15 NaN
3 AC-1 NaN
4 AC-5 AC-2,AC-5,AC-30