I need to read a file in Python, where sections are enclosed by /*!
for the beginning of the section and *
for its end:
/*!Text
this text is to be printed, but it can expand
several lines
even empty lines, but they have to be printed in the same way they're encountered
this until a * character is found
*
/*!Another section starts here
whatever
*
The objective is to print the lines as they're encountered in each section for now (then I'll have to do some processing). To read a file in Python I have something like this:
# open file
with open(filename) as fh:
fit = enumerate(iter(fh.readline, ''), start=1)
# loop over lines
for lino, line in fit:
if line.startswith('/*!T'):
lino, line = next(fit)
print(lino, line)
Now, instead of printing a single line, I would like to print as many lines until the new line starts with the string '/*!'
. In C one would use the peak
function, so is there something equivalent in Python?
UPDATE
So I may have done some progress when opening the file in binary mode (I'm using Python 3):
# open file
with open(filename, 'rb') as fh:
fit = enumerate(iter(fh.readline, ''), start=1)
# loop over lines
for lino, line in fit:
if not line:
break
if line.startswith('/*!T'):
while True:
lino, line = next(fit)
print(str(line))
char = fh.read(1)
# back one character
fh.seek(-1,1)
if char == b'*':
break
But it seems to me there has to be a much compact way to do this in Python. Any suggestions?
I'd use a regular expression:
import re
def get_sections(filename):
with open(filename) as f:
data = f.read()
return re.findall(r'(?sm)^/\*!(.*?)^\*', data)
for section in get_sections('inputfile.txt'):
print section
Alternatively, I might create a generator function that yields only the section lines:
def get_section_line(f):
iterator = enumerate(f)
for lno, line in iterator:
if line.startswith("/*!"):
yield lno, line.replace("/*!", "", 1)
for lno, line in iterator:
if line.startswith('*'):
break
yield lno, line
with open('inputfile.txt') as f:
for lno, line in get_section_line(f):
print "%04d %s"%(lno,line.rstrip('\n'))
Finally, here is a solution which maintains the section structure, in case knowing which section you're in matters:
import itertools
def get_sections(f):
it = enumerate(f)
for lno, line in it:
if line.startswith("/*!"):
yield itertools.chain(
[(lno,line.replace("/*!","",1))],
itertools.takewhile(lambda i: not i[1].startswith('*'), it))
with open('inputfile.txt') as f:
for secno, section in enumerate(get_sections(f)):
for lno, line in section:
print "%04d %04d %s"%(secno, lno,line.rstrip('\n'))