Search code examples
javaeclipseantlr4

Why does ANTLR parser not throw an error for invalid numerical input in Java?


I built following ANTLR grammar (antlr4-runtime-4.13.0) for a simple condition:

grammar Condition;

@header {
package expression;
}

condition
    :(expression)('OR' expression)*
    ;
    
expression
    : IDENT '=' NUM
    ;
    
IDENT : ('a'..'z' | 'A'..'Z')+;
NUM   : [0-9]+;
WS    : [ \t\r\n]+ -> skip;

I used this Java main test it:

public class TestANTLRGrammar extends ConditionBaseListener   {
    
    public static void main(String[] args) {
        String entry = "id = 889xx88 OR y = 7";
        ConditionLexer lexer = new ConditionLexer(CharStreams.fromString(entry));
        TokenStream tokens = new CommonTokenStream(lexer);
        ConditionParser parser = new ConditionParser(tokens);
        parser.condition();
        System.out.println(parser.getNumberOfSyntaxErrors());
    }
}

I expected the parser to throw an error because "889xx88" shouldn't be considered as number but the parser identified "id = 889" and stops without continuing to the rest of the condition (i.e. "OR y = 7"). The function getNumberOfSyntaxErrors() displayed "0". Can anyone help me to fix this problem, please ?

I expected the parser to throw an error as explained above.


Solution

  • For the input id = 889xx88 OR y = 7, the lexer will produce the following 9 tokens:

    • IDENT: id
    • '=': =
    • NUM: 889
    • IDENT: xx
    • NUM: 88
    • 'OR': OR
    • IDENT: y
    • '=': =
    • NUM: 7

    If you now let the parser rule condition consume these tokens, it happily creates IDENT = NUM (id = 889) from these tokens and will then stop parsing.

    As mentioned by kaby76 in the comments: create a start rule that contains the built-in EOF (end-of-file) token to make sure all tokens are consumed (or an error will be reported, if it cannot do so):

    start
     : condition EOF
     ;
    

    Note that chances are that the parser will only print an error to your STDERR and will (try to) continue parsing after the error. This is the default error recovery mode of ANTLR. If you want to change that, try searching for "ANTLR custom error recovery" or "ANTLR custom error handler" (or similar).