Search code examples
algorithmparsingcontext-free-grammarlalrlr-grammar

Is there a general way to convert an unambiguous context-free-grammar into a LALR(1) grammar?


I am trying to create a LALR(1) parser for the following grammar and find some shift/reduce conflicts.

S := expr
expr := lval | ID '[' expr ']' OF expr
lval := ID | lval '[' expr ']'

The conflicts

So the parser can not parse the string "ID[ID]" correctly. My questions are,

  1. Are there any general ways to convert such non-LALR(1) grammars into LALR(1) grammars?
  2. If two grammars generate exactly the same languages and we know that one is not LALR(1), can we know if the other is LALR(1)?

The grammar mentioned above is only an example, what I really want to know is general ways to solve these grammar problems. Any suggestions or reading recommendations are welcome.

Thanks in advance.


Solution

  • 1 . Are there any general ways to convert such non-LALR(1) grammars into LALR(1) grammars?

    No. It may or may not be possible to convert an arbitrary context-free grammar (CFG) into an LALR(1) grammar. Moreover, if you have a CFG and an LALR(1) grammar, you cannot tell whether they recognize the same language. (Worse, there is no algorithm which will even tell you whether an arbitrary CFG recognizes every possible string for its alphabet.)

    2 . If two grammars generate exactly the same languages and we know that one is not LALR(1), can we know if the other is LALR(1)?

    Again, no. As above, there is no algorithm which can verify that two grammars generate the same language, but even supposing that you know that two grammars generate the same language, the fact that one of them is not LALR(1) tells you nothing about the other one.

    There is, however, one useful result. If you have an LALR(k) grammar with finite k > 1, then you can generate an LALR(1) grammar. In other words, there is no such thing as an LALR(k) language for k > 1; if a language has an LALR(k) grammar, it has an LALR(k') grammar for any k' such that 1 ≤ k' < k.

    That doesn't help you with your grammar, however, because the conflict cannot be eliminated by increasing lookahead to any finite value.

    There is an easy way to get rid of that particular shift-reduce conflict, though, and it is a technique which often works. Consider the two conflicting rules:

    lval := lval '[' expr ']'
    expr := ID   '[' expr ']' OF expr
    

    The problem is that in the first case, ID must be reduced to lval immediately (or at least, before the following expr is reduced), but in the second case it may not be reduced to lval. But we can't tell which case we're in until we reduce the expr and encounter the OF (or not).

    If we could finish the lval production without doing the interior lval reduction, then we wouldn't have a problem, because the actual reduction would happen when the token following the ] is visible.

    There's probably a technical term for this, but I don't know it. I've always described it as "reduction deferral", and in many cases it is not very difficult:

    lval' := ID `[` expr `]`
          |  lval' `[` expr `]`
    lval  := ID
          |  lval'
    expr  := lval
          |  ID '[' expr ']' OF expr