Search code examples
pythonpandasdataframelambdamerge

cross check if two df have different values and print any if there


i have two df and i wanna check for the id if the value differs in both df if so i need to print those.

example:

df1 = |id |check_column1|
      |1|abc|
      |1|bcd|
      |2|xyz|
      |2|mno|
      |2|mmm|
df2 = 
      |id |check_column2|
      |1|bcd|
      |1|abc|
      |2|xyz|
      |2|mno|
      |2|kkk|

here the output should be just |2|mmm|kkk| but i am getting whole table as output since index are different

This is what i did

output = pd.merge(df1,df2, on= ['id'], how='inner')

event4 = output[output.apply(lambda x: x['check_column1'] != x['check_column2'], axis=1)]

Solution

  • Idea is sorting values per id in both columns and join with helper counter by GroupBy.cumcount, then is possible filtering not matched rows:

    df1 = df1.sort_values(['id','check_column1'])
    df2 = df2.sort_values(['id','check_column2'])
        
    df = pd.merge(df1,df2, left_on= ['id',df1.groupby('id').cumcount()], 
                           right_on= ['id',df2.groupby('id').cumcount()])
    
    output = df[df['check_column1'] != df['check_column2']]
    print (output)
       id  key_1 check_column1 check_column2
    2   2      0           mmm           kkk