Search code examples
pythontokenply

What is the logic behind the ply lex tokens?


def t_NUMBER_LITERAL(t):
    r'\d+'
    t.value = int( t.value )
    return t

Here if I know well it will convert all the tokens which are number into int because r'\d+' makes it. How can I select only the variables, strings (any number of characters inside vertical double quotation marks) comments (anything between braces { } are accepted but ignored), ect.(any special thing what I want)? Why the r'\d+' select only the munbers?


Solution

  • PLY's lexer takes the regular expression inside the doc string and uses that to match the tokens. So the t_NUMBER_LITERAL function is only called on substrings containing only digits because that's what the regex \d+ matches. Specifically \d matches any digit and the + quantifier makes it match a sequence of one or more.