Search code examples
pythonregexpython-2.7ply

PLY: Illegal Character '+'


I am working on a python parser using ply and I have to parse input in the form of:

VAR VAR1 001 
+000 000 000 000

Where the code would create a variable named VAR 1 then assign the value 0 to it

the regex I wrote for the instanciation is:

t_INST = r'[\+|-]0[ ][0-9][0-9][0-9][ ][0-9][0-9][0-9][ ][0-9][0-9][0-9][ ][0-9][0-9][0-9]'

However when running my programme, PLY prints the following:

Illegal character '+'

A reproducer follows:

import ply.lex as lex

tokens = ['INST']
t_INST = r'[+-]0[ ](\d{3}[ ]){3}\d{3}';
t_ignore  = ' \t'
def t_error(t):
    print("Illegal character '%s'" % t.value[0])
    t.lexer.skip(1)

lexer = lex.lex()

def parse(input_string):
    ret = []
    lexer.input (input_string)
    while True:
        tok = lexer.token()
        if not tok:
            break      # No more input
        ret.append((tok.type, tok.value))
    return ret

print parse("+0 000 000 000")

Solution

  • The line:

    print parse("+0 000 000 000")
    

    doesn't match your stated input format of

    VAR VAR1 001 
    +000 000 000 000
    

    If the actual data is in the same form as +0 000 000 000, then you actually want:

    t_INST = r'[+-]0\s(?:\d{3}\s){2}\d{3}'
    

    ...with which output is: [('INST', '+0 000 000 000')]