Search code examples
tokenantlrantlr4rules

Antlr4 partially matches but does not report an error


here is my antlr4 grammar:

grammar TestExpr;

prog: stat ;

stat: expr
    ;

expr : expr '|' expr #orJoin
     | expr '&' expr  #andJoin
     | '(' expr ')'  #nested
     | KEY '=' value  #kv
     ;
value: KEY | VALUE;

KEY : [a-zA-Z] [a-zA-Z0-9_-]* ;
VALUE: [a-zA-Z0-9] [a-zA-Z0-9._-]* ;
WS : [ \t]+ -> skip ; // toss out whitespace

If "a233=A(" is entered, only "a233=A" can be matched. I expected it to report an error, but it didn't.


Solution

  • change your prog rule to be:

    prog: stat EOF;
    

    ANTLR, by default, will match the longest valid sequence of characters that match your grammar. This also means it will quit parsing when something ceases to "work", if everything up to that point is valid (even if more input is present but leaves the parse in an incomplete state).

    By adding the EOF rule, you're saying that your "rule" is that it must match ALL the input (including the EOF) so you won't encounter this problem. Because of this it's considered a best practice to have any rules that should parse all input end with an EOF token.