Search code examples
pythonpython-3.xfilestring-comparisondifflib

Find differences between a file without checking line by line. Python


I'm trying to check the differences between two output files which contain a mixture of IP Addresses and Subnets. These are stripped from a file and are stored on output1.txt and output2.txt. I'm struggling when doing a comparison. These files don't always have the same number of lines so comparing line by line doesn't seem an option. For example, both files could have IP address 192.168.1.1 but in output1.txt it could be on line 1 and in output2.txt it could be on line 60. How do I compare properly identifying which strings are not in both files?

Code below

import difflib


with open('input1.txt','r') as f:
    with open('output1.txt', 'w') as g:
        for line in f:
            ipaddress = line.split(None, 1)[0]
            g.write(ipaddress + "\n")
with open('input2.txt', 'r') as f:
    with open('output2.txt', 'w') as g:
        for line in f:
            ipaddress = line.split(None, 1)[0]
            g.write(ipaddress + "\n")

with open('output1.txt', 'r') as output1, open('output2.txt', 'r') as output2:
    output1_text = output1.read()
    output2_text = output2.read()
    d = difflib.Differ()
    diff = d.compare(output1_text, output2_text)
    print(''.join(diff))

I will eventually want the differences written to a file, but for now just printing out the result is fine.

Appreciate the help.

Thanks.


Solution

  • You probably want a set comparison:

    with open('output1.txt') as fh1, open('output2.txt') as fh2:
        # collect lines into sets
        set1, set2 = set(fh1), set(fh2)
        
    diff = set1.symmetric_difference(set2)
    
    print(''.join(diff))
    

    Where symmetric_difference will:

    Return a new set with elements in either the set or other but not both.