Search code examples
python-3.xpandasdataframetxt

Compare text file and pandas dataframe based on a condition


I have a pandas dataframe with 4 columns.

dataframe 1:

file    test1   test2   test3   test4
50001   6789999 1789999 89999   69999
50002   76  999 6999    6789
50003   1790001 67999   67999   9789999

and a test.txt file with same four columns. The text file has no headers and column values will be identified based on certain conditions.

50001   6789999 1789999 89999   69999
50002   5790000 999 6999    6789
50003   1790001 67999   67999   9789999

I want to compare both files with the common element filename and write the output in two different files (matched and unmatched rows ). matched - if all columns are matched. unmatched - even if one column is not matched.

Expected output:

Matched file.txt:

50001   6789999 1789999 89999   69999
50003   1790001 67999   67999   9789999

Unmatched file.txt:

50002   76  999 6999    6789

Any pointers would be welcome


Solution

  • You can force df1's columns on df2 with set_axis, then merge with indicator=True and split the rows based on the match:

    tmp = df1.merge(df2.set_axis(df1.columns, axis=1),
                    how='left', indicator=True)
    
    match = tmp.pop('_merge').eq('both')
    
    tmp[match].to_csv('matched.txt', index=False)
    tmp[~match].to_csv('unmatched.txt', index=False)
    

    Output:

    # matched.txt
    file,test1,test2,test3,test4
    50001,6789999,1789999,89999,69999
    50003,1790001,67999,67999,9789999
    
    # unmatched.txt
    file,test1,test2,test3,test4
    50002,76,999,6999,6789