Search code examples
pandasdataframemergemany-to-many

How to join two dataframes with many-to-many in the same group using pandas?


I want to merge two dataframes into one, but there are some duplicate values in the key, which is Item. It can't be achieved by using 'cross join' cuz it only uses "cross join" in the same group. Can someone share the thought to solve it? Thanks

For example:

dataframe1:

ID    Item    Price
1     apple     5
1     banana    3
1     lemon     2
2     apple     7
2     banana    4
2     lemon     4 

dataframe2

Item     state
apple      TX
apple      CA
apple      NJ
banana     CA
lemon      NY
lemon      PA

Expected result:

ID    Item    Price   State
1     apple     5       TX
1     apple     5       NJ
1     apple     5       CA
1     banana    3       CA
1     lemon     2       NY
1     lemon     2       PA
2     apple     7       TX
2     apple     7       NJ
2     apple     7       CA
2     banana    4       CA
2     lemon     4       NY
2     lemon     4       PA

Solution

  • You can do:

    pd.merge(df1, df2).sort_values(by=['ID'])
    

    output:

        ID    Item  Price state
    0    1   apple      5    TX
    1    1   apple      5    CA
    2    1   apple      5    NJ
    6    1  banana      3    CA
    8    1   lemon      2    NY
    9    1   lemon      2    PA
    3    2   apple      7    TX
    4    2   apple      7    CA
    5    2   apple      7    NJ
    7    2  banana      4    CA
    10   2   lemon      4    NY
    11   2   lemon      4    PA