I have a grammar with rules like this
A -> pq
B -> pr
Block -> { Astar Bstar }
Astar -> Astar A
| epsilon
Bstar -> Bstar B
| epsilon
Is there any way to turn this grammar into LALR(1)? From what I can make out, if the parser sees p
inside a block, there's a shift/educe conflict.
Your language is regular, and is equivalent to the regular expression \{(pq)*(pr)*\}
. The problem is that any simple conversion to a grammar is going to require two-character lookahead to see whether there's a q
or r
after the p
Now you have this tagged both yacc
and parsing
so its not clear whether your looking for a practical answer of "how do you deal with this with yacc" or a theoretical answer "is there an LALR(1) grammar for this language."
For the practical answer, the solution is that you punt -- you move the recognition for A
and B
into the lexer where you recognize the character sequences pq
and pr
as the tokens A
and B
. Since lex/flex uses a DFA and backtracks to the longest matched token, they have no problem with arbitrary lookahead here.
For the theoretical answer, you need to transform the grammar to remove the need for lookahead. The problematic construct is (pq)*(pr)*
(the braces are irrelevant) which is equivalent to p(qp)*(rp)*r | p(qp)*q | epsilon
which suggests a grammar like:
Block -> { p Astar Bstar r }
| { p Astar q }
| { }
A -> q p
B -> r p
Astar -> Astar A | epsilon
Bstar -> Bstar B | epsilon
An alternate approach is unfactoring the star
rules to make them not match empty strings:
Block -> { Aplus Bplus }
| { Aplus }
| { Bplus }
| { }
A -> p q
B -> p r
Aplus -> Aplus A | A
Bplus -> Bplus B | B
giving you two very different LALR(1) grammars for the same language.