Search code examples
pythonlarge-datalarge-files

Python : Compare two large files


This is follow up question to Compare two large files which is answerd by phihag

I want to display the count of lines which are different after comparing two files. Want to display if after program completion as a message by saying count of lines are in difference.

My try:

with open(file2) as b:
  blines = set(b)
with open(file1) as a:
  with open(file3, 'w') as result:
    for line in a:
      if line not in blines:
        result.write(line)

lines_to_write = []
with open(file2) as b:
  blines = set(b)
with open(file1) as a:
  lines_to_write = [l for l in a if l not in blines]

print('count of lines are in difference:', len(lines_to_write))

Solution

  • If you can load everything into memory, you can perform the following operations on sets:

    union = set(alines).union(blines)
    intersection = set(alines).intersection(blines)
    unique = union - intersection
    

    EDIT: Even simpler (and faster) is:

    set(alines).symmetric_difference(blines)