Search code examples
pythonpandasmergemerging-data

Merging two data frames with origins


Suppose I have the two following data frames df_1 and df_2 and want to merge them with an additional column which states if the row is present in df_1,df_2 or both. Is there any way I could do this directly while using merge?

df_1

Data Frame 1

df_2

enter image description here

The data frame I would like to have:

enter image description here


Solution

  • Use parameter indicator in merge and for expected output add map by dict:

    df = pd.merge(df1, df2, how='outer', indicator='origin')
    print (df)
       Index   x  y      origin
    0     20   5  8   left_only
    1     35   4  7   left_only
    2     55   3  2        both
    3     60  12  1        both
    4     22   4  3  right_only
    5     30  12  6  right_only
    
    d = {'left_only':'1', 'right_only':'2', 'both':'1 & 2'}
    df['origin'] = df['origin'].map(d)
    print (df)
       Index   x  y origin
    0     20   5  8      1
    1     35   4  7      1
    2     55   3  2  1 & 2
    3     60  12  1  1 & 2
    4     22   4  3      2
    5     30  12  6      2