Search code examples
pythonpandasdatabasejoinouter-join

join only two columns from different data frames


I have a data frame that looks like this:

id  sku  status
1.  jd   NULL
2.  io.  DONE

dataframe2:

id  sku  title   role
6   jd   Apple   manager
3.  xz.  Null    employee

I want to join these two on the column "SKU" such that I get a new df like this:

id  sku  status title
1.  jd   NULL.  Apple
2.  io.  DONE.  NULL

So if an SKU from the first dataframe matches with an sku value from the second df, it should give its title value. If an sku value from the first dataframe is not present in the second dataframe, the title should just be NULL. How can I achieve this? Outer join does not seem to work.

df = pd.merge(dfFirst,dfSecond[['sku','title']],how='outer', on='sku')
df.head()

Solution

  • Instead of using Outer join, you can use a Left join as follows:

    df = pd.merge(dfFirst,dfSecond[['sku','title']],how='left', on='sku')
    

    Left join uses only keys from left frame, similar to a SQL left outer join; preserve key order. Hence, it would give you the desired layout you want.

    Result:

    Note: NULL will be interpreted as NaN after processed by Pandas.

    print(df)
    
    
       id  sku status  title
    0   1   jd    NaN  Apple
    1   2  io.   DONE    NaN