Let's say I was lexing a ruby method definition:
def print_greeting(greeting = "hi")
end
Is it the lexer's job to maintain state and emit relevant tokens, or should it be relatively dumb? Notice in the above example the greeting
param has a default value of "hi"
. In a different context, greeting = "hi"
is variable assignment which sets greeting
to "hi"
. Should the lexer emit generic tokens such as IDENTIFIER EQUALS STRING
, or should it be context-aware and emit something like PARAM_NAME EQUALS STRING
?
I tend to make the lexer as stupid as I possibly can and would thus have it emit the IDENTIFIER EQUALS STRING
tokens. At lexical analysis time there is (most of the time..) no information available about what the tokens should represent. Having grammar rules like this in the lexer only polutes it with (very) complex syntax rules. And that's the part of the parser.