I made a class in python that splits a stream of code in tokens and advances token by token to work with them
import re
class Tokenizer:
def __init__(self, input_file):
self.in_file = input_file
self.tokens = []
self.current_token = None
self.next_token = None
self.line = 1
def split_tokens(self):
''' Create a list with all the tokens of the input file '''
self.tokens = re.findall("\w+|[{}()\[\].;,+\-*/&|<>=~\n]", self.in_file)
def __iter__(self):
for token in self.tokens:
if token != '\n':
yield token
else:
self.line += 1
def advance(self):
self.current_token = self.next_token
self.next_token = next(self.__iter__())
After initialization:
text = 'constructor SquareGame03 new()\n\
{let square=square;\n\
let direction=direction;\n\
return square;\n\
}'
t = Tokenizer(text)
t.split_tokens()
t.advance()
It seems to work if i print the tokens
print(t.current_token, t.next_token)
None constructor
but every other call of the advance method give those results:
t.advance()
print(t.current_token, t.next_token)
constructor constructor
t.advance()
print(t.current_token, t.next_token)
constructor constructor
So it's not advancing and i can't understand why.
In this case, .__iter__
is implemented as a generator function (instead of a generator iterator) which returns a generator iterator.
Every time Tokenizer.advance
is called, a new generator iterator is created and returned by .__iter__
. Instead, an iterator should be stored by a Tokenizer
object at the initialization stage for all subsequent usage.
For example:
import re
class Tokenizer:
def __init__(self, input_file):
self.in_file = input_file
self.tokens = []
self.current_token = None
self.next_token = None
self.line = 1
def split_tokens(self):
''' Create a list with all the tokens of the input file '''
self.tokens = re.findall("\w+|[{}()\[\].;,+\-*/&|<>=~\n]", self.in_file)
self.iterator = self.__iter__()
def __iter__(self):
for token in self.tokens:
if token != '\n':
yield token
else:
self.line += 1
def advance(self):
self.current_token = self.next_token
self.next_token = next(self.iterator)
Another minimal example that may explain:
def fib():
a = 0
b = 1
while True:
yield b
a, b = b, a + b
# 1, 1, 2, ...
fibs = fib()
next(fibs)
next(fibs)
next(fibs)
# 1, 1, 1, ...
next(fib())
next(fib())
next(fib())
By the way, I cannot see the reason to mixed the usage of a .__iter__
magic method and a separate .advance
method. It might introduce some confusion.