Search code examples
parsinglanguage-designcontext-free-grammarlr-grammarambiguous-grammar

Is a context-free grammar that can be transformed into a LR parsing table unambiguous?


I know that in general it is undecidable whether a context-free grammar is unambiguous. However, this does not imply that this cannot be decided for a subset of context-free grammars.


A grammar is used to transform an input text into a parse tree. A grammar is ambiguous if it can produce more than one parse tree for a given input.

The LR parser algorithm first transforms the grammar into a LR parser table. Afterwards, it uses a LR parser automaton to process a given input stream into a parse tree using the LR parser table. The first step is usually done by a parser generator while the second step is executed for each parsing operation.

Consider the LR parser table construction algorithm construct(G) = T | error. The algorithm receives a context-free grammar G. If the table construction is successful a conflict-free LR parser table T is returned. If the table construction failed an error is returned. Examples for such an algorithm are SLR, LALR and CLR. Typical examples for the algorithm failing are shift-reduce and reduce-reduce conflicts.

For a finite input stream and a conflict-free LR parser table the LR parser automaton can deterministically derive one parse tree from a given input stream or return an error iff the input stream does not match the grammar. The parsing step can be formalized as parse(T, I) = O | error, where T is the conflict-free LR parsing table, I is the input stream of tokens and O is a single parse tree. The error is returned iff the input stream does not match the grammar.

Question

Summing up the statements above my understanding is that any grammar that can somehow be transformed into a conflict-free LR parser table is unambiguous. However, if the algorithm returns an error, this does not imply any statement about whether the grammar is ambiguous. Thus, a LR table construction algorithm semi-decides whether a subset of the context-free grammars is unambiguous. Is this correct?

Here are some steps where my conclusion might fall down:

  • the LR table construction algorithm might not terminate
  • the LR parser automaton might not terminate
  • the LR table construction algorithm is incorrect and so the LR parser automaton will accept not the exact same set of input streams as the grammar from which it was constructed or it derives another parse tree than the grammar.

To my knowledge none of the above is true for common LR table construction algorithms.

I was not able to find a clear statement about this, so I would greatly appreciate an explanation as well as a reference where this question is discussed and clearly answered.

I think this question is quite relevant when designing a programming language, since if my conclusion is correct a LR parser makes sure that every program written in the language can be properly parsed. Are there other methods to ensure that a grammar is unambiguous?


Solution

  • If an LR(k) parser can be generated for a grammar, then the grammar is certainly unambiguous. The LR(k) algorithm is deterministic; it will always terminate, either with an error or a correct parsing automaton. (Similarly for SLR(k) and LALR(k) parsers.)

    So you are correct: if the LR(k) parsing table generation algorithm terminates successfully, then the grammar is unambiguous, but if it terminates with an error the grammar may be unambiguous.

    I don't see how this could be generalized to "any" LR table generation algorithm, since an algorithm which purports to be such might be incorrect or non-terminating. But for the standard algorithms it is certainly correct.

    For a formal proof, you could consult Volume II of the classic textbook Parsing Theory by S. Sippu and E. Soisalon-Soininen (Springer-Verlag, 1990).