I am a newbie to Python so just trying things with it.
I have a huge file , where after searching for a search phrase ,I should go back by n lines and get the start of the text, start tag .
After that start reading from that position .
The phrases can occur multiple times . And there are multiple start tags. Please find the sample file as below:
<module>
hi
flowers
<name>xxx</name>
<age>46</age>
</module>
<module>
<place>yyyy</place>
<name>janiiiii</janii>
</module>
Assume the search is , and I need to go back to the line once I search the . The lines between & will vary , they are not static. So once I find the name I need to go back to the module line and start reading it .
Please find the below code:
from itertools import islice
lastiterline=none
line_num=0
search_phrase="Janiii"
with open ('c:\sample.txt',"rb+") as f:
for line in f:
line_num+=1
line=line.strip()
if line.startswith("<module>"):
lastiterline=line
linec=line_num
elif line find(search_phrase)>=0:
if lastiterline:
print line
print linec
This helps me to get the line number of the module corresponding to the word searched.But I am unable to move back the pointer to start reading the lines again from module. There will be multiple search phrases, So everytime I need to go back to that line without breaking the main for, which reads the entire huge file.
For eg :there may be 100 modules tags , and inside that I might have 10 search phrases which I want , so I just need those 10 module tags .
Ok here is an example for you, so you can be more specific with what you need.
This is a sample of your huge_file.txt
:
wgoi jowijg
<start tag>
wfejoije jfie
fwjoejo
THE PHRASE
jwieo
<end tag>
wjefoiw wgworjg
<start tag>
wjgoirg
<end tag>
<start tag>
wfejoije jfie
fwjoejo
woeoj
jwieo
THE PHRASE
<end tag>
And a script read_prev_lines.py
:
hugefile = open("huge_file.txt", "r")
hugefile = hugefile.readlines()
start_locations = []
current_block = -1
for idx, line in enumerate(hugefile):
if "<start tag>" in line:
start_locations.append({"start": idx})
current_block += 1
if "THE PHRASE" in line:
start_locations[current_block]["phr"] = idx
if "<end tag>" in line:
start_locations[current_block]["end"] = idx
#for i in phrase_locations:
for idx in range(len(start_locations)):
if "phr" in start_locations[idx].keys():
print("Found THE PHRASE after %d start tag(s), at line %d:" % (idx, start_locations[idx]["phr"]))
print("Here is the whole block that contains the phrase:")
print(hugefile[start_locations[idx]["start"]: start_locations[idx]["end"]+1])