I am attempting to tokenize the following code:
foo ::= 5
bar ::= 15
foobar ::= 20
so the output is:
['foo', '::=', '5', '\n', 'bar', '::=', '15', '/n', 'foobar', '::=', '20' ]
My current attempt is the following:
reTokens = re.compile(r' ')
tokens = reTokens.split(source)
print(tokens)
However this prints:
['\n', '', '', '', 'foo', '::=', '5\n', '', '', '', 'bar', '::=', '15\n', '', '', '', 'foobar', '::=', '20\n']
As you can see there is a lot of issues. A couple major problems is:
You could do:
from functools import reduce
lines = source.splitlines()
tokens_list = [line.strip().split() for line in lines]
tokens = reduce(lambda x,y: x + ['\n'] + y, tokens_list)
print(tokens)
Which will divide the source to its lines, tokenize each line and then make them a single list with \n
s in between each line.