Search code examples
pythongeneratorregex-lookarounds

Using lookahead with generators


I have implemented a generator-based scanner in Python that tokenizes a string into tuples of the form (token type, token value):

for token in scan("a(b)"):
    print token

would print

("literal", "a")
("l_paren", "(")
...

The next task implies parsing the token stream and for that, I need be able to look one item ahead from the current one without moving the pointer ahead as well. The fact that iterators and generators do not provide the complete sequence of items at once but each item as needed makes lookaheads a bit trickier compared to lists, since the next item is not known unless __next__() is called.

What could a straightforward implementation of a generator-based lookahead look like? Currently I'm using a workaround which implies making a list out of the generator:

token_list = [token for token in scan(string)]

The lookahead then is easily implemented by something like that:

try:
    next_token = token_list[index + 1]
except: IndexError:
    next_token = None

Of course this just works fine. But thinking that over, my second question arises: Is there really a point of making scan() a generator in the first place?


Solution

  • You can write a wrapper that buffers some number of items from the generator, and provides a lookahead() function to peek at those buffered items:

    class Lookahead:
        def __init__(self, iter):
            self.iter = iter
            self.buffer = []
    
        def __iter__(self):
            return self
    
        def next(self):
            if self.buffer:
                return self.buffer.pop(0)
            else:
                return self.iter.next()
    
        def lookahead(self, n):
            """Return an item n entries ahead in the iteration."""
            while n >= len(self.buffer):
                try:
                    self.buffer.append(self.iter.next())
                except StopIteration:
                    return None
            return self.buffer[n]