Search code examples
pythontextreadfilewritefile

Python 3+, Read In Text File and Write to New File Excluding Range of Lines


I'm using Python version 3.6 on a Windows machine. I'm reading in a text file using with open() and readlines(). After reading in the text file lines, I want to write certain lines to a new text file, but exclude certain ranges of lines. I do not know the line numbers of the lines to exclude. The text files are massive and the range of lines to exclude vary among the text files that I'm reading. There are known keywords I can search for to find the start and end of the range to exclude from the text file I want to write to.

I've searched everywhere online but I can't seem to find an elegant solution that works. The following is an example of what I'm trying to achieve.

a  
b  
BEGIN  
c  
d  
e  
END  
f  
g  
h  
i  
j  
BEGIN  
k  
l  
m  
n  
o  
p  
q  
END  
r  
s  
t  
u  
v  
BEGIN  
w  
x  
y  
END  
z 

In summary, I want to read the above into Python. Afterwards, write to a new file but exclude all lines starting at BEGIN and stopping at END keywords.

The new file should contain the following:

a  
b  
f  
g  
h  
i  
j  
r  
s  
t  
u  
v  
z  

Solution

  • If the text files are massive, as you say, you'll want to avoid using readlines() as that will load the entire thing in memory. Instead, read line by line and use a state variable to control whether you're in a block where output should be suppressed. Something sort of like,

    import re
    
    begin_re = re.compile("^BEGIN.*$")
    end_re = re.compile("^END.*$")
    should_write = True
    
    with open("input.txt") as input_fh:
        with open("output.txt", "w", encoding="UTF-8") as output_fh:
            for line in input_fh:
                # Strip off whitespace: we'll add our own newline
                # in the print statement
                line = line.strip()
    
                if begin_re.match(line):
                    should_write = False
                if should_write:
                    print(line, file=output_fh)
                if end_re.match(line):
                    should_write = True