Search code examples
pythonpython-3.xtext-processing

Python: Search text file and write block of lines including previous line to another file


I am searching through a text file and want to copy and write a block of lines associated with a match in another text file. Once i find the search criteria, i want to copy/write out the previous line and next 9 lines (total of 10 lines) out to a file for each match.

Example input file to be search

Line 1: File sent to xyz blah blah:
                             Line 2: Search Criteria here
                             Line 3
                             Line 4
                             Line 5
                             Line 6
                             Line 7
                             Line 8
                             Line 9
                             Line 10

Line 1: File sent to xyz blah blah:
                             Line 2: Search Criteria here
                             Line 3
                             Line 4
                             Line 5
                             Line 6
                             Line 7
                             Line 8
                             Line 9
                             Line 10

Code i have started:

searchList = []
searchStr = "Search Criteria here"

with open('', 'rt') as fInput:
    previous = next(fInput)
    for line in fInput:
        if line.find(searchStr) != -1:
            searchList.append(previous)
            searchList.append(line.lstrip('\n'))


with open('Output.txt','a') as fOutput:
    OutPut.write("\n".join(searchList))

The code above saves to a file like this with spaces in between the first and second line:

mm/dd/yyy  hh:mm:ss.MMM File sent to xyz:

                             Line 2: Search Criteria here

mm/dd/yyy  hh:mm:ss.MMM File sent to xyz:

                             Line 2: Search Criteria here

I want save all 10 lines, exactly as they are in the input file.


Solution

  • First, read the file and find the line numbers that match. Keep track of the line numbers for later.

    all_lines = []
    match_lines = []
    
    with open('in_file.txt', 'r') as fInput:
        for number, line in enumerate(fInput):
            all_lines.append(line)
            if searchStr in line:
                match_lines.append(number)
    

    Then, loop over the match_lines list and output the lines you care about from all_lines:

    num_lines_before = 1
    num_lines_after = 10
    with open('out_file.txt', 'w') as fOutput:
        for line_number in match_lines:
            # Get a slice containing the lines to write out
            output_lines = all_lines[line_number-num_lines_before:line_number+num_lines_after+1]
            fOutput.writelines(output_lines)    
    

    To test this, I'm going to create a io.StringIO object to read/write a string as a file, and ask for one line before and two after:

    import io
    
    strIn = """This is some text
    12345
    2 searchforthis
    34567
    45678
    5 searchforthis
    63r23tf
    7pr9e2380
    89spver894
    949erc8m9
    100948rm42"""
    
    all_lines = []
    match_lines = []
    searchStr = "searchforthis"
    
    # with open('in_file.txt', 'r') as fInput:
    with io.StringIO(strIn) as fInput:
        for number, line in enumerate(fInput):
            all_lines.append(line)
            if searchStr in line:
                match_lines.append(number)
    
    num_lines_before = 1
    num_lines_after = 2
    
    
    
    # with open('out_file.txt', 'w') as fOutput:
    with io.StringIO("") as fOutput:
        for line_number in match_lines:
            # Get a slice containing the lines to write out
            output_lines = all_lines[line_number-num_lines_before:line_number+num_lines_after+1]
            fOutput.writelines(output_lines)    
            fOutput.write("----------\n") # Just to distinguish matches when we test
        
        fOutput.seek(0)
        print(fOutput.read())
    

    Gives this output:

    12345
    2 searchforthis
    34567
    45678
    ----------
    45678
    5 searchforthis
    63r23tf
    7pr9e2380
    ----------