Search code examples
pythoniteratoryield

Python - next method not working properly with generator


I made a class in python that splits a stream of code in tokens and advances token by token to work with them

import re

class Tokenizer:

    def __init__(self, input_file):
        self.in_file = input_file
        self.tokens = []
        self.current_token = None
        self.next_token = None
        self.line = 1

    def split_tokens(self):
        ''' Create a list with all the tokens of the input file '''
        self.tokens = re.findall("\w+|[{}()\[\].;,+\-*/&|<>=~\n]", self.in_file)

    def __iter__(self):
        for token in self.tokens:
            if token != '\n':
                yield token 
            else:
                self.line += 1

    def advance(self):
        self.current_token = self.next_token
        self.next_token = next(self.__iter__())

After initialization:

text = 'constructor SquareGame03 new()\n\
       {let square=square;\n\
       let direction=direction;\n\
       return square;\n\
       }'

t = Tokenizer(text)
t.split_tokens()
t.advance()

It seems to work if i print the tokens

print(t.current_token, t.next_token)
None constructor

but every other call of the advance method give those results:

t.advance()
print(t.current_token, t.next_token)
constructor constructor
t.advance()
print(t.current_token, t.next_token)
constructor constructor

So it's not advancing and i can't understand why.


Solution

  • In this case, .__iter__ is implemented as a generator function (instead of a generator iterator) which returns a generator iterator.

    Every time Tokenizer.advance is called, a new generator iterator is created and returned by .__iter__. Instead, an iterator should be stored by a Tokenizer object at the initialization stage for all subsequent usage.

    For example:

    import re
    
    class Tokenizer:
    
        def __init__(self, input_file):
            self.in_file = input_file
            self.tokens = []
            self.current_token = None
            self.next_token = None
            self.line = 1
    
        def split_tokens(self):
            ''' Create a list with all the tokens of the input file '''
            self.tokens = re.findall("\w+|[{}()\[\].;,+\-*/&|<>=~\n]", self.in_file)
            self.iterator = self.__iter__()
    
        def __iter__(self):
            for token in self.tokens:
                if token != '\n':
                    yield token 
                else:
                    self.line += 1
    
        def advance(self):
            self.current_token = self.next_token
            self.next_token = next(self.iterator)
    

    Another minimal example that may explain:

    def fib():
        a = 0
        b = 1
        while True:
            yield b
            a, b = b, a + b
    
    # 1, 1, 2, ...
    fibs = fib()
    next(fibs)
    next(fibs)
    next(fibs)
    
    # 1, 1, 1, ...
    next(fib())
    next(fib())
    next(fib())
    

    By the way, I cannot see the reason to mixed the usage of a .__iter__ magic method and a separate .advance method. It might introduce some confusion.