Faster way to remove duplicates from a very large text file in Python?

I have a very large text file with duplicate entries which I want to eliminate. I do not care about the order of the entries because the file will later be sorted.

Here is what I have so far:

unique_lines = set()
outfile = open("UniqueMasterList.txt", "w", encoding = "latin-1")

with open("MasterList.txt", "r", encoding = "latin-1") as infile:
    for line in infile:
        if line not in unique_lines:
            outfile.write(line)
            unique_lines.add(line)

outfile.close()

It has been running for 30 minutes and has not finished. I need it to be faster. What is a faster approach in Python?

Solution

To use the same technique as uniq, in Python:

import itertools
with open("MasterList.txt", "r", encoding = "latin-1") as infile:
    sorted_file = sorted(infile.readlines())
for line, _ in itertools.groupby(sorted_file):
    outfile.write(line)

This presumes that the entire file will fit into memory, twice. Or that the file is already sorted and you can skip that step.