Search code examples
pythonpandasdataframemergeouter-join

Duplication issues when outer-merging with pandas


I have an issue regarding duplication and pandas. I have two dataframes I must outer-join, for example, df 1 is given

id type value1
1 a 100
1 b 200

where id==1 contains two types with different values and I want to join this with another df,

id value2 value3
1 50 300

I am merging the two using

df_merged = df1.merge(df2,how='outer',on='id')

The result is

id type value1 value2 value3
1 a 100 50 300
1 b 200 50 300

where it is clear that the value2 and value3 duplicates which may create issues if I e.g. wants to sum value2 or value3. Is there any way to merge and create e.g.

id type value1 value2 value3
1 a 100 50 300
1 b 200 NaN NaN

or some type of other approach?

Thanks!


Solution

  • You could merge as you described, and then use:

    df_merged.loc[df_merged.duplicated(subset=[dupe_cols]), [dupe_cols]] = np.nan