python python-3.x duplicates text-files txt

Removing duplicates from text file using python

I have this text file and let's say it contains 10 lines.

Bye
Hi
2
3
4
5
Hi
Bye
7
Hi

Every time it says "Hi" and "Bye" I want it to be removed except for the first time it was said. My current code is (yes filename is actually pointing towards a file, I just didn't place it in this one)

text_file = open(filename) 
for i, line in enumerate(text_file):
    if i == 0:
       var_Line1 = line
    if i = 1:
       var_Line2 = line
    if i > 1: 
       if line == var_Line2:
          del line
text_file.close()

It does detect the duplicates, but it takes a very long time considering the amount of lines there are, but I'm not sure on how to delete them and save it as well

Solution

You could use dict.fromkeys to remove duplicates and preserve order efficiently:

with open(filename, "r") as f:
    lines = dict.fromkeys(f.readlines())
with open(filename, "w") as f:
    f.writelines(lines)

Idea from Raymond Hettinger