Search code examples
pythonlexlexerply

What is the order of preference when we mix function and string type token definitions in ply.lex?


tokens = (
        NUMBER2,
        NUMBER1,
        )

def t_NUMBER1(t):
    r '[0-9]+'
    return t

t_NUMBER2 = r '[0-9][0-9]'

If I use the above token specifications in ply.lex then which token will get higher preference? I know that in case of functions, the ones defined first have higher preference and in case of strings, the bigger lengthed regular expressions have a higher preference.

What about when I have a mixture of both string and function type token specifications? And does the order in the tokens = () tuple affect the preference order?


Solution

  • According to the doc:

    Internally, lex.py uses the re module to do its patten matching. When building the master regular expression, rules are added in the following order:

    1. All tokens defined by functions are added in the same order as they appear in the lexer file.
    2. Tokens defined by strings are added next by sorting them in order of decreasing regular expression length (longer expressions are added first).

    If this is correct, tokens defined by a function have higher "priority" than those defined by string.