Search code examples
pythonpython-3.xduplicatestext-filestxt

Removing duplicates from text file using python


I have this text file and let's say it contains 10 lines.

Bye
Hi
2
3
4
5
Hi
Bye
7
Hi

Every time it says "Hi" and "Bye" I want it to be removed except for the first time it was said. My current code is (yes filename is actually pointing towards a file, I just didn't place it in this one)

text_file = open(filename) 
for i, line in enumerate(text_file):
    if i == 0:
       var_Line1 = line
    if i = 1:
       var_Line2 = line
    if i > 1: 
       if line == var_Line2:
          del line
text_file.close()

It does detect the duplicates, but it takes a very long time considering the amount of lines there are, but I'm not sure on how to delete them and save it as well


Solution

  • You could use dict.fromkeys to remove duplicates and preserve order efficiently:

    with open(filename, "r") as f:
        lines = dict.fromkeys(f.readlines())
    with open(filename, "w") as f:
        f.writelines(lines)
    

    Idea from Raymond Hettinger