Search code examples
pandasdataframefuzzywuzzy

Fuzzy Match two dataframe based on list value column


I have two Dataframes that I am trying to merge by match to values in a column. The column to match are header1 in Df1 and header2 in Df2 (the value is list). I am looking for an adapted approach.

Df1:                                          Df2:
|--------------|---------------|              |--------------|------------------------|       
|      id_1    |     header1   |              |      id_2    |    header2             |
|--------------|---------------|              |--------------|------------------------|
|      112     |      item1    |              |      32      |['item1','item2',...]   |
|--------------|---------------|              |--------------|------------------------|
|      56      |      mass     | 
|--------------|---------------|

Wanted results:

Df_merged:                                       
|--------------|---------------|--------------|------------------------|       
|      id_1    |     header1   |      id_2    |    header2             |
|--------------|---------------|--------------|------------------------|
|      112     |      item1    |      32      |['item1','item2',...]   |
|--------------|---------------|--------------|------------------------|

Does anyone know how to merge these two dataframe ?. I have absolutely no idea how to handle this. Thanks a lot in advance.


Solution

  • Use DataFrame.explode with reassign header2 to header1 for avoid lost original column header2 and then use DataFrame.merge:

    df = df1.merge(df2.assign(header1 = df2['header2']).explode('header1'), on='header1')