Search code examples
antlrantlr4

Why is ANTLR not recognizing my numeral token


I'm a total ANTLR neophyte; and trying my best to spike up on tech.

I'm at a loss to understand why my token defined as NUMERAL is not matching a number.

I'm sure this is some bone-headed move on my part but if it's something else (i.e. priority location of rules, etc.) I can definitely post my full lexer, but here is the token I'm having issues with:

DECIMAL_NUMERAL  : NUMERAL ('.' NUMERAL)?;
NUMERAL          : DIGIT+;
fragment DIGIT  : [0-9];

the text I'm using in Antl4-lab: actions+=/car.shift,if=tachvalue>3300

and here is the full parser; I've trimmed it down quite a bit in order to locate the error:

parser grammar SimcParser;
options { tokenVocab = SimcLexer; }

profile : 
    (comment | action_base)*
    EOF;

action_base : 
    ( conditionalAction) NEWLINE;

comment : HASH SENTENCE NEWLINE;
actionpart : ACTIONS (DOT subName=IDENTIFIER)?;
conditionalAction :
    actionpart
    (OP_EQ | ASSIGN)
    actionName=dotted_name
    ACTIF
    exp;
    
dotted_name :
    IDENTIFIER (DOT IDENTIFIER)?;
    
eval :
BITWISE_OR | LT | GT | OP_EQ | OP_NOT | OP_LE | OP_GE;

exp :
  propertyName=IDENTIFIER eval qualifier
;

qualifier :
    NUMERAL;

and my resulting tree with the error. Obviously I'm missing something big; as I though that the NUMERAL token (in qualifier) would catch a full numeric value with multiple digits but it does not. It also doesn't work on a single digit. : enter image description here


Solution

  • If you want to keep recognizing a NUMERAL besides a DECIMAL_NUMERAL, move NUMERAL above DECIMAL_NUMERAL:

    NUMERAL          : DIGIT+;
    DECIMAL_NUMERAL  : NUMERAL ('.' NUMERAL)?; // you can remove the `?` since this rule will never match a single `NUMERAL`
    

    The way ANTLR creates tokens is as follows:

    1. pick the lexer rule that matches the most characters
    2. whenever 2 or more rules match the same characters, let the rule defined first "win"

    Because input like "123" matches both DECIMAL_NUMERAL and NUMERAL, the order is important. Note that with my suggestion, the input "123" will now never become a DECIMAL_NUMERAL but always a NUMERAL! If you need to match any numerical token in your parser, combine the two in one parser rule:

    number
     : DECIMAL_NUMERAL
     | NUMERAL
     ;