I have a pandas dataframe with 4 columns.
dataframe 1:
file test1 test2 test3 test4
50001 6789999 1789999 89999 69999
50002 76 999 6999 6789
50003 1790001 67999 67999 9789999
and a test.txt file with same four columns. The text file has no headers and column values will be identified based on certain conditions.
50001 6789999 1789999 89999 69999
50002 5790000 999 6999 6789
50003 1790001 67999 67999 9789999
I want to compare both files with the common element filename and write the output in two different files (matched and unmatched rows ). matched - if all columns are matched. unmatched - even if one column is not matched.
Expected output:
Matched file.txt:
50001 6789999 1789999 89999 69999
50003 1790001 67999 67999 9789999
Unmatched file.txt:
50002 76 999 6999 6789
Any pointers would be welcome
You can force df1
's columns on df2
with set_axis
, then merge
with indicator=True
and split the rows based on the match:
tmp = df1.merge(df2.set_axis(df1.columns, axis=1),
how='left', indicator=True)
match = tmp.pop('_merge').eq('both')
tmp[match].to_csv('matched.txt', index=False)
tmp[~match].to_csv('unmatched.txt', index=False)
Output:
# matched.txt
file,test1,test2,test3,test4
50001,6789999,1789999,89999,69999
50003,1790001,67999,67999,9789999
# unmatched.txt
file,test1,test2,test3,test4
50002,76,999,6999,6789