I am new to Pandas and have tried reviewing different solutions on here but I am having some issues with joining two data frames. They are joining ok, however for some for some columns I would like it to to remain as NULL instead of duplicating
Sample DF1
employee | year |
---|---|
x12345 | 2023 |
x12345 | 2022 |
z12345 | 2015 |
z12345 | 2013 |
Sample DF2
employee | status |
---|---|
x12345 | active |
z12345 | active |
m12345 | inactive |
q12345 | active |
What I have tried
merged = pd.merge(DF1, DF2, how="outer", on=["employee"]
What I currently have
employee | year | status |
---|---|---|
x12345 | 2023 | active |
x12345 | 2022 | active |
z12345 | 2015 | active |
z12345 | 2013 | active |
What I need
employee | year | status |
---|---|---|
x12345 | 2023 | active |
x12345 | 2022 | NULL |
z12345 | 2015 | active |
z12345 | 2013 | NULL |
What I have tried
merged = pd.merge(DF1, DF2, how="outer", on=["employee"]
Code
df1.assign(key=df1.groupby('employee').cumcount())\
.merge(df2.assign(key=df2.groupby('employee').cumcount()), how='left')\
.drop('key', axis=1)
out:
employee year status
0 x12345 2023 active
1 x12345 2022 NaN
2 z12345 2015 active
3 z12345 2013 NaN