Search code examples
parsingantlrantlr4lr-grammar

Why doesn't ANTLR "over-reduce" this expression?


I have the following grammar:

expr : factor op ;
op
    : '+' factor op
    | // Blank rule for left-recursion elimination
    ;

factor 
    : NUM
    | '(' expr ')'
    ;

NUM : ('0'..'9')+ ;

I supply 2 + 3, using expr as the start rule. The resulting parse tree from ANTLR is correct; however, I think I am misunderstanding the shift-reduce methods it uses.

I would expect the following to happen:

 Step # | Parse Stack   | Lookahead | Unscanned | Action
 1      |               | NUM       | + 3       | Shift
 2      | NUM           | +         | 3         | Reduce by factor -> NUM
 3      | factor        | +         | 3         | Shift a 'null'?
 4      | factor null   | +         | 3         | Reduce by op -> null
 5      | factor op     | +         | 3         | Reduce by expr -> factor op
 6      | expr          | +         | 3         | Shift
 7      | expr +        | NUM       |           | Shift
 8      | expr + NUM    |           |           | Reduce by factor -> NUM
 9      | expr + factor |           |           | ERROR (no production)

I would've expected an error to occur at step 3 wherin the parser would shift a null onto the stack as a prerequisite to reduceing the factor "up" to an expr.

Does ANTLR only shift a null when it's strictly "required" because the resulting reduce will satisfy the grammar?


Solution

  • It seems to me that ANTLR doesn't use a shift-reduce parser; the generated parsers are recursive descent using an arbitrary amount of lookahead.

    The steps of the parser would be something like:

    Rule          | Consummed | Input
    --------------+-----------+------
    expr          |           | 2 + 3
    ..factor      |           | 2 + 3
    ....NUM       | 2         | + 3
    ..op          | 2         | + 3
    ....'+'       | 2 +       | 3
    ....factor    | 2 +       | 3
    ......NUM     | 2 + 3     |
    ....op        | 2 + 3     |
    ......(empty) | 2 + 3     |
    

    From what I read about ANTLR, you could achieve the same result with the following changes to the original grammar:

    expr: factor op*;
    op: '+' factor;
    ...