Search code examples
pythonpandasjoin

JOIN two dataframes on common column in pandas


I have a dataframe df:

id   name   count
1    a       10
2    b       20
3    c       30
4    d       40
5    e       50

Here I have another dataframe df2:

id1  price   rating
 1     100     1.0
 2     200     2.0
 3     300     3.0
 5     500     5.0

I want to join these two dataframes on column id and id1. Here is an example of df3:

id   name   count   price   rating
1    a       10      100      1.0
2    b       20      200      2.0
3    c       30      300      3.0
4    d       40      Nan      Nan
5    e       50      500      5.0

Should I use df.merge or pd.concat?


Solution

  • Use merge:

    print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left').drop('id1', axis=1))
       id name  count  price  rating
    0   1    a     10  100.0     1.0
    1   2    b     20  200.0     2.0
    2   3    c     30  300.0     3.0
    3   4    d     40    NaN     NaN
    4   5    e     50  500.0     5.0
    

    Another solution is simple rename column:

    print (pd.merge(df1, df2.rename(columns={'id1':'id'}), on='id',  how='left'))
       id name  count  price  rating
    0   1    a     10  100.0     1.0
    1   2    b     20  200.0     2.0
    2   3    c     30  300.0     3.0
    3   4    d     40    NaN     NaN
    4   5    e     50  500.0     5.0
    

    If need only column price the simpliest is map:

    df1['price'] = df1.id.map(df2.set_index('id1')['price'])
    print (df1)
       id name  count  price
    0   1    a     10  100.0
    1   2    b     20  200.0
    2   3    c     30  300.0
    3   4    d     40    NaN
    4   5    e     50  500.0
    

    Another 2 solutions:

    print (pd.merge(df1, df2, left_on='id', right_on='id1', how='left')
             .drop(['id1', 'rating'], axis=1))
       id name  count  price
    0   1    a     10  100.0
    1   2    b     20  200.0
    2   3    c     30  300.0
    3   4    d     40    NaN
    4   5    e     50  500.0
    

    print (pd.merge(df1, df2[['id1','price']], left_on='id', right_on='id1', how='left')
             .drop('id1', axis=1))
       id name  count  price
    0   1    a     10  100.0
    1   2    b     20  200.0
    2   3    c     30  300.0
    3   4    d     40    NaN
    4   5    e     50  500.0