I'm working on a simple LL(1) parser generator, and I've run into an issue with PREDICT/PREDICT conflicts given certain input grammars. For example, given an input grammar like:
E → E + E
| P
P → 1
I can remove out the left recursion from E
, replacing it with a roughly equivalent right recursive rule, thus arriving at the grammar:
E → P E'
E' → + E E'
| ε
P → 1
Next, I can compute the relevant FIRST and FOLLOW sets for the grammar, and end up with the following:
FIRST(E) = { 1 }
FIRST(E') = { +, ε }
FIRST(P) = { 1 }
FOLLOW(E) = { +, EOF }
FOLLOW(E') = { +, EOF }
FOLLOW(P) = { +, EOF }
And finally, using PREDICT(A → α) = { FIRST(α) - ε } ∪ (FOLLOW(A) if ε ∈ FIRST(α) else ∅)
to construct the PREDICT sets for the grammar, the resulting sets are as follows.
PREDICT(1. E → P E') = { 1 }
PREDICT(2. E' → + E E') = { +, EOF }
PREDICT(3. E' → ε) = { +, EOF }
PREDICT(4. P → 1) = { 1 }
So this is where I run into the conflict that PREDICT(2) = PREDICT(3)
, and thus, I cannot produce a parse table as the grammar is not LL(1), since parser wouldn't be able to choose which rule should be applied.
What I'm really wondering is whether it's possible to resolve the conflict or factor the grammar such that the conflict can be avoided, and produce a legal LL(1) grammar, without having to directly modify the original input grammar.
The problem here is that your original grammar is ambiguous.
E → E + E
E → P
means that P + P + P
can be parsed either as (P + P) + P
or P + (P + P)
. Eliminating left recursion doesn't fix the ambiguity, so the modified grammar is also ambiguous. And ambiguous grammars can't be LL(k) (or, for that matter, LR(k)).
So you need to make the grammar unambiguous:
E → E + P
E → P
(That's the common left-associative version.) Once you eliminate left recursion, you end up with:
E → P E'
E' → + P E'
| ε
Now +
is not in FOLLOW(E').
(The example is drawn straight from the Dragon book, but simplified; it's example 4.8 in the rather battered old copy I have.)
It's worth noting that the transformation used here preserves the set of strings derived by the grammar, but not the derivation. The parse tree which results from the modified grammar is effectively right-associative, so it will need to be reprocessed to recover the desired parse. This fact is rather briefly mentioned by the Dragon book authors:
Although left-recursion elimination and left factoring are easy to do, they make the resulting grammar hard to read and difficult to use for translation purposes. (My emphasis)
They go on to suggest that operator precedence parsing can be used for expressions, and then mention that if an LR parser generator is available, dividing the grammar into a predictive part and an operator-precedence part is no longer necessary.