I am trying to match complex numbers using different notations, one of them using the cis
function as such : MODULUS cis
PHASE
The problem is that my identifier rule matches the cis
as well as the start of the number following it and since it's bigger than the CIS
token itself it always returns an identifier token type. How could i avoid that ?
Here's the grammar :
grammar Sandbox;
input : number? CIS UNSIGNED
| IDENTIFIER
;
number : FLOAT
| UFLOAT
| UINT
| INT
;
fragment DIGIT : [0-9] ;
UFLOAT : UINT (DOT UINT? | 'f') ;
FLOAT : SUB UFLOAT ;
UINT : DIGITS ;
INT : SUB UINT ;
UNSIGNED : UFLOAT
| UINT
;
DIGITS : DIGIT+ ;
// Specific lexer rules
CIS : 'cis' ;
SUB : '-' ;
DOT : '.' ;
WS : [ \t]+ -> skip ;
NEWLINE : '\r'? '\n' ;
IDENTIFIER : [a-zA-Z_]+[a-zA-Z0-9_]* ; // has to be after complex so i or cis doesn't match this first
Edit:
The input i was trying to parse with is the complex 1+i
but using it's respective modulus and phase like this : 1.4142135623730951cis0.7853981633974483
And my actual problem is that the IDENTIFIER rule matches cis0
instead of just matching the CIS lexer rule even though it's defined before it.
I vaguely know that ANTLR chooses the rule based on the biggest match, but in this case i want to avoid that =o.
I see two solutions here:
COMPLEX: (FLOAT | UFLOAT | UINT | INT) WS* CIS WS* UNSIGNED;
which will be longer than an identifier or the pur CIS keyword (and hence matched first).
cis
secquence is a keyword, when it follows a digit (with optional whitespaces between them), right? So, you could do a lookback (LA(-1)
in your predicate to reject cis
as identifier if that condition is true.I'd prefer solution 1, because the convention is that single entities (and a complex number is, like a float number or a string, a single logicial entity) are match completely in a lexer rule, not in a parser rule.