I am quite new to Python and only have the piecewise cookie cutter knowledge of what I have found through numerous web pages.
That being said, I am trying to search through a file (~10k lines) for a set 'filter'-like criteria I wrote, and then I want it to print the lines that fit the criteria AND a line that is X amount of lines before it.
I have created the following script to open said file, iterate line by line, and print the line that meets the filter criteria to an output file, however I am stumped on how to incorporate this into the current script.
import os
output_file = 'Output.txt'
filename = 'BigFile.txt'
numLines = 0
numWords = 0
numChrs = 0
numMes = 0
f1 = open(output_file, 'w')
print 'Output File has been Opened'
with open(filename, 'r') as file:
for line in file:
wordsList = line.split()
numLines += 1
numWords += len(wordsList)
numChrs += len(line)
if "X" in line and "Y" not in line and "Z" in line:
numMes += 1
print >>f1, line
print 'Object found and Catalogued in Output.txt'
print "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print >>f1, "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print "There are a total of %i thing in this file" % (numMes)
print >>f1, "There are a total of %i things in this file" % (numMes)
f1.close()
print 'Output Files have been Closed'
My first guess was using line.enumeration
but I don't think I can just state something like lines - 5
to print the line that is 5 before lines
:
lines = f1.enumeration()
if "blah blah" in line and "so so" not in line:
print >>f1, lines
print >>f1, [lines - 5]
The best part is yet to come though, because I have to take the Output.txt file and compare with another file to output the matching criteria in both files... but one step at a time, right?
-Also feel free to add in blurbs of 'proper' technique... I'm sure this script can be written a better way, so please do educate me on anything I am doing wrong.
Thanks in advance for any help!
UPDATE: Have successfully implemented the fix thanks to the help below:
import os
output_file = 'Output.txt'
filename = 'BigFile.txt'
numLines = 0
numWords = 0
numChrs = 0
numMulMes = 0
last5 = []
f1 = open(output_file, 'w')
print 'Output Files have been Opened'
with open(filename, 'r') as file:
for line in file:
wordsList = line.split()
numLines += 1
numWords += len(wordsList)
numChrs += len(line)
last5[:] = last5[-5:]+[line]
if "X" in line and "Y" not in line and "Z" not in line:
del last5[1:5] ###the missing piece of the puzzle!
numMulMes += 1
print >>f1, last5
print 'Object found and Catalogued in Output.txt'
print "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print >>f1, "Lines: %i\nWords: %i\nCharacters: %i" % (numLines, numWords, numChrs)
print "There are a total of %i messages in this file" % (numMulMes)
print >>f1, "There are a total of %i messages in this file" % (numMulMes)
f1.close()
f3.close()
print 'Output Files have been Closed'
I kept trying to just modify the output file via another separate script, and for the longest time I was fighting str vs lst operation and error problems. Just decided to come back to the original script and throw it in there on a whim, and vioila.
Thanks for pushing me in the right direction, it was easy to figure out from there!
You solved most of the stuff yourself (counting words, lines, linenumbers etc.) - You can simply remember the last n lines while going through your file.
Example:
t = """"zero line
one line
two line
three line
four line
five line
six line
seven line
eight line
"""
last5 = [] # memory cell
for l in t.split("\n"): # similar to your for line in file:
last5[:] = last5[-4:]+[l] # keep last 4 and add current line, inplace list mod
if "six" in l:
print last5
You can also look at deque and specify a max-length (you need to import it)
from collections import deque
last5 = deque(maxlen=5)
for l in t.split("\n"):
last5.append(l) # will automatically only keep 5 (maxlen)
if "six" in l:
print last5
Output:
# list version
['two line', 'three line', 'four line ', 'five line ', 'six line']
# deque version
deque(['two line', 'three line', 'four line ', 'five line ', 'six line'], maxlen=5)