Search code examples
c++parsinggrammarformal-languages

Why can't C++ be parsed with a LR(1) parser?


I was reading about parsers and parser generators and found this statement in wikipedia's LR parsing -page:

Many programming languages can be parsed using some variation of an LR parser. One notable exception is C++.

Why is it so? What particular property of C++ causes it to be impossible to parse with LR parsers?

Using google, I only found that C can be perfectly parsed with LR(1) but C++ requires LR(∞).


Solution

  • There is an interesting thread on Lambda the Ultimate that discusses the LALR grammar for C++.

    It includes a link to a PhD thesis that includes a discussion of C++ parsing, which states that:

    "C++ grammar is ambiguous, context-dependent and potentially requires infinite lookahead to resolve some ambiguities".

    It goes on to give a number of examples (see page 147 of the pdf).

    The example is:

    int(x), y, *const z;
    

    meaning

    int x;
    int y;
    int *const z;
    

    Compare to:

    int(x), y, new int;
    

    meaning

    (int(x)), (y), (new int));
    

    (a comma-separated expression).

    The two token sequences have the same initial subsequence but different parse trees, which depend on the last element. There can be arbitrarily many tokens before the disambiguating one.