Search code examples
python-3.xpython-re

Is there a faster way to extract lines from a file?


I have a set of files that I need to search through and extract certain lines. Right now, I'm using a for loop but this is proving costly in terms of time. Is there a faster way than the below?

import re

for file in files:
        localfile = open(file, 'r')
        for line in localfile:
                if re.search("Common English Words", line):
                      words = line.split("|")[0]
                      # Append words to file words.txt
                      open("words.txt","a+").write(words + "\n")

Solution

  • Well for one thing, you are creating a new file descriptor every time that you write to the words.txt file. I ran some tests and found that python garbage collection does in fact close open file descriptors when they become inaccessible (at least in my test case). However, creating a file descriptor every time that you want to append to a file is going to be costly. For future reference, it is considered good practice to use with as blocks for opening files.

    TLDR: One improvement you could make is to open the file you are writing to just once. Here is what that would look like:

    import re
    
    with open("words.txt","a+") as words_file:
        for file in files:
                localfile = open(file, 'r')
                    for line in localfile:
                            if re.search("Common English Words", line):
                                  words = line.split("|")[0]
                                  # Append words to file words.txt
                                  words_file.write(words + "\n")
    

    Like I said, using with as statements when opening files is considered best practice. We can fully implement this best practice like so:

    import re
    
    with open("words.txt","a+") as words_file:
        for file in files:
                with open(file, 'r') as localfile:
                    for line in localfile:
                            if re.search("Common English Words", line):
                                  words = line.split("|")[0]
                                  # Append words to file words.txt
                                  words_file.write(words + "\n")