I have this lexer config:
WS
: ((' ' | '\t' | '\r' | '\n')+ | '\\' '\n') -> skip
;
T_QUOTED
: '"'
;
T_CONFDIR_MYDIR
: 'MyDirective' -> pushMode(mydir)
;
T_COMMENT
: '#' .*? '\r'? '\n'
;
mode mydir;
T_MYDIRARG
: ~([\\" ])+ -> popMode
;
And this is the input:
MyDirective "LiteralString"
When I try to parse (with Python, actually) I get this error:
line 4:21 token recognition error at: ' '
line 4:22 token recognition error at: '"'
line 4:23 extraneous input 'LiteralString' expecting '"'
line 5:0 mismatched input '<EOF>' expecting T_MYDIRARG
It seems like if the state goes to mydir
, then the tokens in default mode (WS
, T_QUOTED
) are disappeared.
Why does not lexer recognize the space and the "
characters (as those are defined as WS
and T_QUOTED
)?
What would be the expected solution?
Thanks.
If you go into the mydir
mode after the input MyDirective
, the first char will be a space char, which the mydir
does not recognize.
mydir
can only recognize tokens defined in its own scope, not tokens in other scopes (also not the default scope/mode). In other words, in your case, mydir
only recognizes T_MYDIRARG
tokens.
Look like what you're after is something like this:
WS
: ((' ' | '\t' | '\r' | '\n')+ | '\\' '\n') -> skip
;
T_QUOTED_OPEN
: '"' -> pushMode(mydir)
;
T_CONFDIR_MYDIR
: 'MyDirective'
;
T_COMMENT
: '#' .*? '\r'? '\n'
;
mode mydir;
T_QUOTED_CLOSE
: '"' -> popMode
;
T_MYDIRARG
: ~([\\" ])+
;
which will produce the following:
5 tokens:
1 T_CONFDIR_MYDIR 'MyDirective'
2 T_QUOTED_OPEN '"'
3 T_MYDIRARG 'LiteralString'
4 T_QUOTED_CLOSE '"'
5 EOF '<EOF>'