Good evening,
let's say I have two dataframes:
Dataframe 1:
id | first_name | last_name | age | personnel_number
1 | Jane | Doe | 37 | 0045ac
2 | John | Doe | 35 | 0102ha
3 | Sarah | Smith | 28 | 1003px
17 | Michael | Mueller | 61 | 0800pw
Dataframe 2:
id | first_name | last_name | age | personnel_number
1 | Jane | Doe | 37 | 0045ac
2 | John | Doe | 35 | 0102ha
3 | Sarah | Smith | 41 | 1003px
4 | Sam | Smith | 24 | 0017ix
I know, that with the following code I'm getting a new dataframe, where the existing rows are getting updated and the new ones are added...
df_comp = df2.set_index('personnel_number').combine_first(df1.set_index('personnel_number')).reset_index()
...to achieve this:
Combinded Dataframe:
id | first_name | last_name | age | personnel_number
1 | Jane | Doe | 37 | 0045ac
2 | John | Doe | 35 | 0102ha
3 | Sarah | Smith | 41 | 1003px
17 | Michael | Mueller | 61 | 0800pw
4 | Sam | Smith | 24 | 0017ix
My question: Is there a way to achieve three instead of one combined dataframe with the following data:
Notes
Thanks for your help and advice and have a nice weekend!
You can try outer merge with indicator and then some conditions followed by a groupby then store in a dictionary:
out = df2.merge(df1,how='outer',indicator='group')
c = out.groupby("personnel_number",sort=False).transform('nunique').gt(1).any(1)
out['group'] = (np.select([out['group'].eq("both"),out['group'].ne("both") & c,
out['group'].isin(['both','left_only']) & ~c],
['Already_exists','Updated','New']))
d = dict(iter(out.groupby("group")))
Output: