I have this Jison lexer and parser:
%lex
%%
\s+ /* skip whitespace */
'D01' return 'D01'
[xX][+-]?[0-9]+ return 'COORD'
<<EOF>> return 'EOF'
. return 'INVALID'
/lex
%start source
%%
source
: command EOF;
command
: D01 COORD;
It will tokenize and parse D01 X45
but not D01X45
. What am I missing?
Unlike (f)lex -- or, indeed, the vast majority of scanner generators, jison scanners do not implement the longest-match rule. Instead, the first matching pattern wins.
In order to make this work for keywords, jison scanners also implement the restriction that simple literal strings -- like "D01" -- only match if they end on a word-boundary.
The workaround is to enclose the literal string pattern with redundant parentheses:
("D01") { return 'D01'; }
This is documented in the jison wiki