Search code examples
pythonpandasdata-scienceanalysis

Compare pandas two different pandas dataframe to extract the difference


I have two pandas dataframes and I want to compare them to see what the differences are. df1 has a column of all unique IDs, a column of text data, and a column of numeric data. df2 has the same structure but it contains multiple records of the same IDs. I want to take a specific ID from df1 and its corresponding columns then compare it to all the matching IDs in df2 and their corresponding columns. Then i want to take the difference and put them into new df3.

EDIT: df3 should not have rows from df1 if it does not exist in df2

import pandas as pd
data1 = {'ID':['L1', 'L2', 'L3', 'L4'], 'Text':['1A', '1B','1C','1D'], 'Num':[1, 2, 3, 4]}
df1 = pd.DataFrame(data1)
print(df1)
ID Text Num
L1 1A 1
L2 1B 2
L3 1C 3
L4 1D 4
data2 = {'ID':['L1', 'L2', 'L3', 'L1', 'L2', 'L3'], 'Text':['1A','1B','1C','2A','2B','1C'], 'Num':[1, 2, 3, 11,2,123]}
df2 = pd.DataFrame(data2)
print(df2)
ID Text Num
L1 1A 1
L2 1B 2
L3 1C 3
L1 2A 11
L2 2B 2
L3 1C 13

I want the out put to looks like:

ID Text Num
L1 2A 11
L2 2B 2
L3 1C 123

Solution

  • You can use an outer merge with indicator:

    (df1.merge(df2, how='outer', indicator=True)
        .loc[lambda d: d.pop('_merge').eq('right_only')]
     )
    

    Output:

       ID Text  Num
    4  L1   2A   11
    5  L2   2B    2
    6  L3   1C  123
    

    NB. If you need to keep the index, reset_index() before merge, then set_index('index') afterwards.