Search code examples
pythoniteratoriterable

peak next character in a file


I need to read a file in Python, where sections are enclosed by /*! for the beginning of the section and * for its end:

/*!Text
 this text is to be printed, but it can expand
 several lines

 even empty lines, but they have to be printed in the same way they're encountered

 this until a * character is found
*

/*!Another section starts here
  whatever
*

The objective is to print the lines as they're encountered in each section for now (then I'll have to do some processing). To read a file in Python I have something like this:

# open file
with open(filename) as fh:

    fit = enumerate(iter(fh.readline, ''), start=1)

    # loop over lines
    for lino, line in fit:

        if line.startswith('/*!T'):
            lino, line = next(fit)
            print(lino, line)

Now, instead of printing a single line, I would like to print as many lines until the new line starts with the string '/*!'. In C one would use the peak function, so is there something equivalent in Python?

UPDATE

So I may have done some progress when opening the file in binary mode (I'm using Python 3):

# open file
with open(filename, 'rb') as fh:

    fit = enumerate(iter(fh.readline, ''), start=1)

    # loop over lines
    for lino, line in fit:

        if not line:
            break

        if line.startswith('/*!T'):
            while True:

                lino, line = next(fit)
                print(str(line))

                char = fh.read(1)
                # back one character
                fh.seek(-1,1)
                if char == b'*':
                    break

But it seems to me there has to be a much compact way to do this in Python. Any suggestions?


Solution

  • I'd use a regular expression:

    import re
    
    def get_sections(filename):
      with open(filename) as f:
        data = f.read()
      return re.findall(r'(?sm)^/\*!(.*?)^\*', data)
    
    for section in get_sections('inputfile.txt'):
      print section
    

    Alternatively, I might create a generator function that yields only the section lines:

    def get_section_line(f):
      iterator = enumerate(f)
      for lno, line in iterator:
        if line.startswith("/*!"):
          yield lno, line.replace("/*!", "", 1)
          for lno, line in iterator:
            if line.startswith('*'):
              break
            yield lno, line
    
    with open('inputfile.txt') as f:
      for lno, line in get_section_line(f):
        print "%04d %s"%(lno,line.rstrip('\n'))
    

    Finally, here is a solution which maintains the section structure, in case knowing which section you're in matters:

    import itertools
    def get_sections(f):
      it = enumerate(f)
      for lno, line in it:
        if line.startswith("/*!"):
          yield itertools.chain(
              [(lno,line.replace("/*!","",1))],
              itertools.takewhile(lambda i: not i[1].startswith('*'), it))
    
    with open('inputfile.txt') as f:
      for secno, section in enumerate(get_sections(f)):
        for lno, line in section:
          print "%04d %04d %s"%(secno, lno,line.rstrip('\n'))