Search code examples
pythondifflib

Get the list of characters added to a file


I have an original file and another one that contains some extra characters. I am looking for the list of characters added to this file. I tried to use difflib but I have errors because characters can be inserted in the middle of a word.

import difflib

with open('file1') as f1:
    f1_text = f1.read()
with open('file2') as f2:
    f2_text = f2.read()

differ = difflib.Differ()
diffs = list(differ.compare(f1_text, f2_text))

lines = list(diffs)
removed = [line[1:] for line in lines if line[0] == '-']
f = open("results", "a")
f.write(''.join(removed))

File1

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

File2

LRorFem ipsum docdlor sit avcvcmet, consGecte5tur adiFbpiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo cocdnseqduat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Result

R F c d c v c m t c o n s G e c t e 5 t r a d F p i s c i n g e l i t , s e d d e i u s m o d t e m p o n c i d i d u n t u t
l a b o r e e t d o l o r e m a g a a . U t e n m a d m i n i m v n i a m , u i s n o s t r u d e x e r c i t a t i o n u l l a m c o l a b o r i s n i s i u t a l i q u i p e x e a c o m m o d o c o c d n s e q d

Expected Result : RFdcvcvcG5Fbcdd


Solution

  • You just need to iterate through each file one character at a time

    result = []
    
    with open('file1') as file1, open('file2') as file2:
        ch1, ch2 = file1.read(1), file2.read(1)
        while ch1 and ch2:
            if ch1 == ch2:
                ch1, ch2 = file1.read(1), file2.read(1)
            else:
                result.append(ch2)
                ch2 = file2.read(1)
    
    print(result)
    
    ['R', 'F', 'c', 'd', 'v', 'c', 'v', 'c', 'G', '5', 'F', 'b', 'c', 'd', 'd']