Search code examples
parsinggrammaryacclalrlr-grammar

Is there a way to make this grammar LALR(1)?


I have a grammar with rules like this

A -> pq
B -> pr

Block -> { Astar Bstar }

Astar -> Astar A
       | epsilon

Bstar -> Bstar B
       | epsilon

Is there any way to turn this grammar into LALR(1)? From what I can make out, if the parser sees p inside a block, there's a shift/educe conflict.


Solution

  • Your language is regular, and is equivalent to the regular expression \{(pq)*(pr)*\}. The problem is that any simple conversion to a grammar is going to require two-character lookahead to see whether there's a q or r after the p

    Now you have this tagged both yacc and parsing so its not clear whether your looking for a practical answer of "how do you deal with this with yacc" or a theoretical answer "is there an LALR(1) grammar for this language."

    For the practical answer, the solution is that you punt -- you move the recognition for A and B into the lexer where you recognize the character sequences pq and pr as the tokens A and B. Since lex/flex uses a DFA and backtracks to the longest matched token, they have no problem with arbitrary lookahead here.

    For the theoretical answer, you need to transform the grammar to remove the need for lookahead. The problematic construct is (pq)*(pr)* (the braces are irrelevant) which is equivalent to p(qp)*(rp)*r | p(qp)*q | epsilon which suggests a grammar like:

    Block -> { p Astar Bstar r }
           | { p Astar q }
           | { }
    A -> q p
    B -> r p
    Astar -> Astar A | epsilon
    Bstar -> Bstar B | epsilon
    

    An alternate approach is unfactoring the star rules to make them not match empty strings:

    Block -> { Aplus Bplus }
           | { Aplus }
           | { Bplus }
           | { }
    A -> p q
    B -> p r
    Aplus -> Aplus A | A
    Bplus -> Bplus B | B
    

    giving you two very different LALR(1) grammars for the same language.