Search code examples
antlrantlr3antlrworks

How to resolve this parsing ambiguitiy in Antlr3


Hopefully this is just the right amount of information to help me solve this problem.

Given the following ANTLR3 syntax

grammar mygrammar;

program : statement* | function*;

function : ID '(' args ')' '->' statement+ (','statement+) '.' ;    

args    : arg (',' arg)*;       

arg     : ID ('->' expression)?;

statement : assignment
          | number
          | string
          ;

assignment : ID '->' expression;    

string  : UNICODE_STRING;

number : HEX_NUMBER | INTEGER ( '.' INTEGER )?;


// ================================================================

HEX_NUMBER : '0x' HEX_DIGIT+;

INTEGER : DIGIT+;

fragment
DIGIT   :   ('0'..'9');

Here is the line that is causing problems in the parser.

my_function(x, y, z -> 42) -> 10001.

ANTLRWorks highlights the last . after the 10001 in red as being a problem with the following error.

How can I make this stop throwing org.antlr.runtime.EarlyExitException?

I am sure this is because of some ambiguity between my number parser rule and trying to use the . as a EOL delimiter.


Solution

  • There is another ambiguity that also needs fixing. Change:

    program : statement* | function*;
    

    into:

    program  : (statement | function)*;
    

    (although the 2 are not equivalent, I'm guessing you want the latter)

    And in your function rule, you now defined there to be at least 2 statements:

    function : ID '(' args ')' '->' statement (','statement)+ '.' ; 
    

    while I'm guessing you really want at least one:

    function : ID '(' args ')' '->' statement (','statement)* '.' ; 
    

    Now, your real problem: since you're constructing floats in a parser rule, from the end of your input, 10001., the parser tries to construct a number of it, while you want it to match an INTEGER and then a ., as you yourself already said in your OP.

    To fix this, you need to give the parser a bit of extra look-ahead to "see" beyond this ambiguity. Do that by adding the predicate (INTEGER '.' INTEGER)=> before actually matching said input:

    number
      :  HEX_NUMBER 
      |  (INTEGER '.' INTEGER)=> INTEGER '.' INTEGER 
      | INTEGER
      ;
    

    Now your input will generate the following parse tree:

    enter image description here