I am working on a python parser using ply and I have to parse input in the form of:
VAR VAR1 001
+000 000 000 000
Where the code would create a variable named VAR 1 then assign the value 0 to it
the regex I wrote for the instanciation is:
t_INST = r'[\+|-]0[ ][0-9][0-9][0-9][ ][0-9][0-9][0-9][ ][0-9][0-9][0-9][ ][0-9][0-9][0-9]'
However when running my programme, PLY prints the following:
Illegal character '+'
A reproducer follows:
import ply.lex as lex
tokens = ['INST']
t_INST = r'[+-]0[ ](\d{3}[ ]){3}\d{3}';
t_ignore = ' \t'
def t_error(t):
print("Illegal character '%s'" % t.value[0])
t.lexer.skip(1)
lexer = lex.lex()
def parse(input_string):
ret = []
lexer.input (input_string)
while True:
tok = lexer.token()
if not tok:
break # No more input
ret.append((tok.type, tok.value))
return ret
print parse("+0 000 000 000")
The line:
print parse("+0 000 000 000")
doesn't match your stated input format of
VAR VAR1 001
+000 000 000 000
If the actual data is in the same form as +0 000 000 000
, then you actually want:
t_INST = r'[+-]0\s(?:\d{3}\s){2}\d{3}'
...with which output is: [('INST', '+0 000 000 000')]