Search code examples
pythonpython-2.7listtuplesenumeration

How to print line X lines prior to correct if statement


I am quite new to Python and only have the piecewise cookie cutter knowledge of what I have found through numerous web pages.

That being said, I am trying to search through a file (~10k lines) for a set 'filter'-like criteria I wrote, and then I want it to print the lines that fit the criteria AND a line that is X amount of lines before it.

I have created the following script to open said file, iterate line by line, and print the line that meets the filter criteria to an output file, however I am stumped on how to incorporate this into the current script.

import os

output_file = 'Output.txt'
filename = 'BigFile.txt'                 

numLines = 0
numWords = 0
numChrs = 0
numMes = 0

f1 = open(output_file, 'w')
print 'Output File has been Opened'

with open(filename, 'r') as file:
   for line in file:
      wordsList = line.split()
      numLines += 1
      numWords += len(wordsList)
      numChrs += len(line)

      if "X" in line and "Y" not in line and "Z" in line:
          numMes += 1
          print >>f1, line
          print 'Object found and Catalogued in Output.txt'                          

print "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print >>f1, "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)

print "There are a total of %i thing in this file" % (numMes)
print >>f1, "There are a total of %i things in this file" % (numMes)

f1.close()

print 'Output Files have been Closed'

My first guess was using line.enumeration but I don't think I can just state something like lines - 5 to print the line that is 5 before lines:

lines = f1.enumeration()
if "blah blah" in line and "so so" not in line:
    print >>f1, lines
    print >>f1, [lines - 5]

The best part is yet to come though, because I have to take the Output.txt file and compare with another file to output the matching criteria in both files... but one step at a time, right?

-Also feel free to add in blurbs of 'proper' technique... I'm sure this script can be written a better way, so please do educate me on anything I am doing wrong.

Thanks in advance for any help!


UPDATE: Have successfully implemented the fix thanks to the help below:

import os

output_file = 'Output.txt'
filename = 'BigFile.txt'                 

numLines = 0
numWords = 0
numChrs = 0

numMulMes = 0

last5 = []

f1 = open(output_file, 'w')
print 'Output Files have been Opened'

with open(filename, 'r') as file:
    for line in file:
        wordsList = line.split()
        numLines += 1
        numWords += len(wordsList)
        numChrs += len(line)
        last5[:] = last5[-5:]+[line] 
        if "X" in line and "Y" not in line and "Z" not in line:
            del last5[1:5]           ###the missing piece of the puzzle!
            numMulMes += 1
            print >>f1, last5
            print 'Object found and Catalogued in Output.txt'

print "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print >>f1, "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)

print "There are a total of %i messages in this file" % (numMulMes)
print >>f1, "There are a total of %i messages in this file" % (numMulMes)

f1.close()
f3.close()

print 'Output Files have been Closed'

I kept trying to just modify the output file via another separate script, and for the longest time I was fighting str vs lst operation and error problems. Just decided to come back to the original script and throw it in there on a whim, and vioila.

Thanks for pushing me in the right direction, it was easy to figure out from there!


Solution

  • You solved most of the stuff yourself (counting words, lines, linenumbers etc.) - You can simply remember the last n lines while going through your file.

    Example:

    t = """"zero line
    one line
    two line
    three line
    four line 
    five line 
    six line
    seven line 
    eight line
    """ 
    
    last5 = [] # memory cell
    for l in t.split("\n"):  # similar to your for line in file: 
        last5[:] = last5[-4:]+[l] # keep last 4 and add current line, inplace list mod 
    
        if "six" in l:
            print last5
    

    You can also look at deque and specify a max-length (you need to import it)

    from collections import deque
    
    last5 = deque(maxlen=5)
    for l in t.split("\n"): 
        last5.append(l) # will automatically only keep 5 (maxlen)
    
        if "six" in l:
            print last5
    

    Output:

     # list version
     ['two line', 'three line', 'four line ', 'five line ', 'six line'] 
    
     # deque version
     deque(['two line', 'three line', 'four line ', 'five line ', 'six line'], maxlen=5)