Search code examples
parsingantlrgrammar

Basic parsing structure for nested structures?


I've come across the following bnf quite frequently for parsing a basic arithmetic expression:

S :== EXPRESSION
EXPRESSION :== TERM | TERM { [+,-] TERM] }
TERM :== FACTOR | FACTOR { [*,/] FACTOR] }
FACTOR :== number | '(' EXPRESSION ')'

-- or --

expression :    term | term + term | term − term
term :      factor | factor * factor | factor / factor
factor :    number | ( expression ) | + factor | − factor

This would parse something like 2+3-4*(1+2). However, what is required to parse something like 1+1+1+1, as the above factor cannot also refer to the expression production itself?


Solution

  • Your first version is correct, but not BNF. It uses "extended" syntax, which includes the repetition operator { ... }, which means "zero or more instances of the pattern inside the braces". So TERM { [+,-] TERM] } means TERM or TERM + TERM or TERM - TERM + TERM or TERM + TERM + TERM - TERM, etc.

    Your second version is not correct (and does not correspond to the first version). You don't provide any citation for it, so I can't tell if you copied it incorrectly or your source was simply wrong. Here's a correct version:

    expression: term | expression + term | expression − term
    term:       factor | term * factor | term / factor
    factor:     number | ( expression ) | + factor | − factor
    

    I hope it becomes clear how that analyses 1 + 1 + 1 + 1 into a left-associated sequence of additions.