ANTLR: "for" keyword used for loops conflicts with "for" used in messages

I have the following grammar:

myg                : line+ EOF ;

line                : ( for_loop | command params ) NEWLINE;

for_loop : FOR WORD INT DO NEWLINE stmt_body;

stmt_body: line+ END;

params              : ( param | WHITESPACE)*;

param                : WORD | INT;

command             : WORD;


fragment LOWERCASE  : [a-z] ;
fragment UPPERCASE  : [A-Z] ;
fragment DIGIT : [0-9] ;

WORD                : (LOWERCASE | UPPERCASE | DIGIT | [_."'/\\-])+ (DIGIT)* ;
INT : DIGIT+ ;
WHITESPACE          : (' ' | '\t')+ -> skip;
NEWLINE             : ('\r'? '\n' | '\r')+ -> skip;
FOR: 'for';
DO: 'do';
END: 'end';

My problem is that the 2 following are valid in this language:

message please wait for 90 seconds

This would be a valid command printing a message with the word "for".

for n 2 do

This would be the beginning of a for loop.

The problem is that with the current lexer it doesn't match the for loop since 'for' is matched by the WORD rule as it appears first.

I could solve that by putting the FOR rule before the WORD rule but then 'for' in message would be matched by the FOR rule

Solution

This is the typical keywords versus identifier problem and I thought there were quite a number of questions regarding that here on Stackoverflow. But to my surprise I can only find an old answer of mine for ANTLR3.

Even though the principle mentioned there remains the same, you no longer can change the returned token type in a parser rule, with ANTLR4.

There are 2 steps required to make your scenario work.

Define the keywords before the WORD rule. This way they get own token types you need for grammar parts which require specific keywords.
Add keywords selectively to rules, which parse names, where you want to allow those keywords too.

For the second step modify your rules:

param: WORD | INT | commandKeyword;
command: WORD | commandKeyword;
commandKeyword: FOR | DO | END; // Keywords allowed as names in commands.