Search code examples
parsingcompiler-constructionll-grammar

Why is this grammar LL(1) even though all the FIRST sets are the same?


Consider the following CFG:

S := AbC

A := aA | epsilon

C := Ac

Here, FIRST(A) = FIRST(B) = FIRST(C) = {a, ε}, so all the FIRST sets are the same. However, this grammar is supposedly LL(1). How is that possible? Wouldn't that mean that there would be a bunch of FIRST/FIRST conflicts everywhere?


Solution

  • There's nothing fundamentally wrong about having multiple nonterminals that have the same FIRST sets. Things only become problematic if you have multiple nonterminals with overlapping FIRST or FOLLOW sets in a context where you have to choose between a number of production options.

    As an example, consider this simple grammar:

    A → bB | cC

    B → b | c

    C → b | c

    Notice that all three of A, B, and C have the same FIRST set, namely {b, c}. But this grammar is also LL(1). While you can formally convince yourself of this by writing out the actual LL(1) parsing table, you can think of this in another way as well. Imagine you're reading the nonterminal A, and you see the character b. Which production should you pick: A → bB, or A → cC? Well, there's no reason to pick A → cC, because that would put c at the front of your string. So don't pick that one. Instead, pick A → bB. Similarly, suppose you're reading an A and you see the character c. Which production should you pick? You'd never want to pick A → bB, since that would put b at the front of your string. Instead, you'd pick A → cC.

    Notice that in this discussion, we never stopped to think about what FIRST(B) or FIRST(C) was. It simply didn't come up because we never needed to know what characters could be produced there.

    Now, let's look at your example. If you're trying to expand an S, there's only one possible production to apply, which is S → AbC. So there's no possible conflict here; when you see S, you always apply that rule. Similarly, if you're trying to expand a C, there's no choice of what to do. You have to expand C → Ac.

    So now let's think about the nonterminal A, where there really is a choice of what to do next. If you see the character a, then we have to decide - do we expand out A → aA, or do we expand out A → ε? In answering that question, we have to think about the FOLLOW set of A, since the production A → ε would only make sense to pick if we saw a terminal symbol where we basically just want to get A out of the way. Here, FOLLOW(A) = {b, c}, with the b coming from the production S → AbC and the c coming from the production C → Ac. So we'd only pick A → ε if we see b or c, not if we see a. That means that

    • on reading a, we pick A → aA, and
    • on reading b o r c, we pick A → ε.

    Notice that in this discussion we never needed to think about what FIRST(B) or FIRST(C) was. In fact, we never even needed to look at what FIRST(A) was either! So that's why there isn't necessarily a conflict. Were we to encounter a scenario where we needed to compare FIRST(A) against FIRST(B) or something like that, then yes, we'd definitely have a conflict. But that never came up, so no conflict exists.