I have an original file and another one that contains some extra characters. I am looking for the list of characters added to this file. I tried to use difflib but I have errors because characters can be inserted in the middle of a word.
import difflib
with open('file1') as f1:
f1_text = f1.read()
with open('file2') as f2:
f2_text = f2.read()
differ = difflib.Differ()
diffs = list(differ.compare(f1_text, f2_text))
lines = list(diffs)
removed = [line[1:] for line in lines if line[0] == '-']
f = open("results", "a")
f.write(''.join(removed))
File1
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
File2
LRorFem ipsum docdlor sit avcvcmet, consGecte5tur adiFbpiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo cocdnseqduat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Result
R F c d c v c m t c o n s G e c t e 5 t r a d F p i s c i n g e l i t , s e d d e i u s m o d t e m p o n c i d i d u n t u t
l a b o r e e t d o l o r e m a g a a . U t e n m a d m i n i m v n i a m , u i s n o s t r u d e x e r c i t a t i o n u l l a m c o l a b o r i s n i s i u t a l i q u i p e x e a c o m m o d o c o c d n s e q d
Expected Result : RFdcvcvcG5Fbcdd
You just need to iterate through each file one character at a time
result = []
with open('file1') as file1, open('file2') as file2:
ch1, ch2 = file1.read(1), file2.read(1)
while ch1 and ch2:
if ch1 == ch2:
ch1, ch2 = file1.read(1), file2.read(1)
else:
result.append(ch2)
ch2 = file2.read(1)
print(result)
['R', 'F', 'c', 'd', 'v', 'c', 'v', 'c', 'G', '5', 'F', 'b', 'c', 'd', 'd']