I have the following ANTLR grammar
relation
: IDENTIFIER EQUAL relative_date
;
relative_date
: K_NOW (PLUS | MINUS) NUMERIC_LITERAL TIME_UNIT
;
IDENTIFIER
: //'"' (~'"' | '""')* '"'
'`' (~'`' | '``')* '`'
| '[' ~']'* ']'
| [a-zA-Z_] [a-zA-Z_.0-9]*
;
TIME_UNIT
: ('h'|'m'|'s'|'d'|'w'|'M'|'y'|'q')
;
PLUS : '+';
MINUS : '-';
EQUAL: '=';
K_NOW : N O W;
NUMERIC_LITERAL
: [0-9]+ ;
If I put TIME_UNIT
before IDENTIFIER
parser
something = now - 5d
worksd = now - 5d
DOES NOT work and fails at first d
and says IDENTIFIER
requiredIf I put TIME_UNIT
after IDENTIFIER
parser
something = now - 5d
fails at the second d
and says TIME_UNIT requiredd = now - 5d
fails at the second d
and says TIME_UNIT requiredCan someone help me how can I change the grammar to work in both cases? Like when it is a relative date use TIME_UNIT
lexer otherwise IDENTIFIER
lexer
ANTLR's lexer tries to match as much characters as possible. When 2 or more lexer rules match the same amount of characters, the rule defined first "wins".
So, the input d
matches both TIME_UNIT
and IDENTIFIER
, but because IDENTIFIER
is defined first, it wins. In other words: the rule TIME_UNIT
will never be matched.
The solution, put TIME_UNIT
before IDENTIFIER
:
TIME_UNIT
: ('h'|'m'|'s'|'d'|'w'|'M'|'y'|'q')
;
K_NOW
: N O W
;
IDENTIFIER
: //'"' (~'"' | '""')* '"'
'`' (~'`' | '``')* '`'
| '[' ~']'* ']'
| [a-zA-Z_] [a-zA-Z_.0-9]*
;
(Note that K_NOW
will also need to be placed before IDENTIFIER
!)
However, now the input d
, h
, m
, etc. will never become an IDENTIFIER
because these will now always become a TIME_UNIT
. You cannot change this, that is how ANTLR's lexer works. You can handle this in the parser like this:
identifier
: IDENTIFIER
| TIME_UNIT
;
TIME_UNIT
: ('h'|'m'|'s'|'d'|'w'|'M'|'y'|'q')
;
IDENTIFIER
: //'"' (~'"' | '""')* '"'
'`' (~'`' | '``')* '`'
| '[' ~']'* ']'
| [a-zA-Z_] [a-zA-Z_.0-9]*
;
and then use the rule identifier
in your parser rules instead of IDENTIFIER
:
relation
: identifier EQUAL relative_date
;