I have a data frame that looks like this:
id sku status
1. jd NULL
2. io. DONE
dataframe2:
id sku title role
6 jd Apple manager
3. xz. Null employee
I want to join these two on the column "SKU" such that I get a new df like this:
id sku status title
1. jd NULL. Apple
2. io. DONE. NULL
So if an SKU from the first dataframe matches with an sku value from the second df, it should give its title value. If an sku value from the first dataframe is not present in the second dataframe, the title should just be NULL. How can I achieve this? Outer join does not seem to work.
df = pd.merge(dfFirst,dfSecond[['sku','title']],how='outer', on='sku')
df.head()
Instead of using Outer join, you can use a Left join as follows:
df = pd.merge(dfFirst,dfSecond[['sku','title']],how='left', on='sku')
Left join uses only keys from left frame, similar to a SQL left outer join; preserve key order. Hence, it would give you the desired layout you want.
Result:
Note: NULL
will be interpreted as NaN
after processed by Pandas.
print(df)
id sku status title
0 1 jd NaN Apple
1 2 io. DONE NaN