I have the following Antlr grammar:
grammar MyGrammar;
doc : intro planet;
intro : 'hi';
planet : 'world';
MLCOMMENT
: '/*' ( options {greedy=false;} : . )* '*/' { $channel = HIDDEN; };
WHITESPACE : (
(' ' | '\t' | '\f')+
|
// handle newlines
( '\r\n' // DOS/Windows
| '\r' // Macintosh
| '\n' // Unix
)
)
{ $channel = HIDDEN; };
In the ANTLRWorks 1.2.3 interpreter, the inputs hi world
,hi/**/world
and hi /*A*/ world
work, as expected.
However, the input hiworld
, which shouldn't work, is also accepted.
How do I make hiworld
fail? How do I force at least one whitespace(or comment) between "hi" and "world"?
Note that I've used only MLCOMMENT and WHITESPACE in this example to simplify, but other kinds of comments would be supported.
You need to create a general ID token. Since the lexer builds the longest token it can, it would see the input "hiworld" as a single word since it's longer than "hi" or "world" by themselves. Such a rule might look like:
ID : ('a'..'z' | 'A'..'Z')+;
As an example, that's exactly how parsers for programming languages separate the "do" keyword from "double" (keyword type, starts with 'do') or "done" (variable name).