i have two df and i wanna check for the id if the value differs in both df if so i need to print those.
example:
df1 = |id |check_column1|
|1|abc|
|1|bcd|
|2|xyz|
|2|mno|
|2|mmm|
df2 =
|id |check_column2|
|1|bcd|
|1|abc|
|2|xyz|
|2|mno|
|2|kkk|
here the output should be just |2|mmm|kkk| but i am getting whole table as output since index are different
This is what i did
output = pd.merge(df1,df2, on= ['id'], how='inner')
event4 = output[output.apply(lambda x: x['check_column1'] != x['check_column2'], axis=1)]
Idea is sorting values per id
in both columns and join with helper counter by GroupBy.cumcount
, then is possible filtering not matched rows:
df1 = df1.sort_values(['id','check_column1'])
df2 = df2.sort_values(['id','check_column2'])
df = pd.merge(df1,df2, left_on= ['id',df1.groupby('id').cumcount()],
right_on= ['id',df2.groupby('id').cumcount()])
output = df[df['check_column1'] != df['check_column2']]
print (output)
id key_1 check_column1 check_column2
2 2 0 mmm kkk