file2 has a big list of numbers. File1 has a small list of numbers. file2 is a duplicate of some of the numbers in file1. I want to remove the duplicate numbers in file2 from file1 without deleting any data from file2 but at same time not deleting the line number in file1. I use PyCharm IDE and that assigns the line number. This code does remove the duplicate data from file1 and does not remove the data from file2. Which is what I want, however it is deleting the duplicate numbers and the lines and rewiting them in file1 which is what I don't want to do.
import fileinput
# small file2
with open('file2.txt') as fin:
exclude = set(line.rstrip() for line in fin)
# big file1
for line in fileinput.input('file1.txt', inplace=True):
if line.rstrip() not in exclude:
print(line)
Example: of what is happening, file2 34344
file-1 at start:
54545
34344
23232
78787
file-1 end:
54545
23232
78787
What I want.
file-1 start:
54545
34344
23232
78787
file-1 end:
54545
23232
78787
You just need to print an empty line when you find a data that is in the exclude
set.
import fileinput
# small file2
with open('file2.txt') as fin:
exclude = set(line.rstrip() for line in fin)
# big file1
for line in fileinput.input('file1.txt', inplace=True):
if line.rstrip() not in exclude:
print(line, end='')
else:
print('')
If file1.txt is:
54545
1313
23232
13551
And file2.txt is:
1313
13551
After running the script before file1.txt becomes:
54545
23232
As you said, this code is in fact rewriting all the lines, those edited and those not. Delete and rewrite only few lines in the middle of a file is not easy, and in any case I am not sure it will be more efficient in your case, as you do not know a priori which lines should be edited: you will always need to read and process the full file line by line to know which lines should be edited. As far as I know, you will hardly find a solution really more efficient than this one. Glad to be denied if anybody knows how.