I've found out an exercise that require a trick to understand whether a grammar is LR(1) with no parsing table operations.
The grammar is the followed:
S -> Aa | Bb
A -> aAb | ab
B -> aBbb | abb
Do you know what is the trick behind?
Thanks, :)
Imagine that you're an LR(1) parser and that you've just read aab
with a lookahead of b
. (I know, you're probably thinking "man, that happens to me all the time!") What exactly should you do here?
Looking at the grammar, you can't tell whether the initial production was Aa
or Bb
, so you're going to have to simultaneously consider production rules for A
and for B
. If you look at the A
options, you'll see that one option here would be to reduce A
→ ab
, which is plausible here because the lookahead is a b
and that's precisely what you'd expect to find after seeing an ab
when expanding out an A
(notice that there's the rule A
→ aRb
, so any recursively-expanded A
s would be followed by a b
). So that tells you to reduce. On the other hand, look at the B
options. If you see aab
followed by a b
, you'd be thinking "oh, that second b
is going to make aabb
, and then I'd go and reduce B
→ abb
, because that's totally a thing I like to do because I'm an LR(1) parser." So that tells you to shift. At that point, bam! You've got a shift/reduce conflict, so you're almost certainly not going to have an LR(1) grammar.
So does that actually happen? Well, let's go build the LR(1) configurating sets that we'd see if we did indeed read aab
and see b
as a lookahead:
Initial State
S' -> .S [$]
S -> .Aa [$]
S -> .Bb [$]
A -> .aAb [a]
A -> .ab [a]
B -> .aBbb [b]
B -> .abb [b]
State after reading a
A -> a.Ab [a]
A -> a.b [a]
A -> .aAb [b]
A -> .ab [b]
B -> a.Bbb [b]
B -> a.bb [b]
B -> .aBbb [b]
B -> .abb [b]
State after reading aa
A -> a.Ab [b]
A -> a.b [b]
A -> .aAb [b]
A -> .ab [b]
B -> a.Bbb [b]
B -> a.bb [b]
B -> .aBbb [b]
B -> .abb [b]
State after reading aab
A -> ab. [b]
B -> ab.b [b]
And hey! There's that shift/reduce conflict we were talking about. That first item reduces on b, but the second shifts on b. So there you go! Our intuition led us to think that this isn't going to be an LR(1) grammar, and if we look at the tables the evidence is supported by the data.
So how would you know to try that? Well, in general, it's pretty hard to do this. The main cue, for me, at least, is that the parser has to guess whether it wants A
or B
at some point, but the way it tiebreaks is the number of b
s. The parser was going to have to at some point determine whether it likes ab
and to go with A
or whether it likes abb
and to go with B
, but it can't see both of the b
s before making the decision. That led me to think that we'd like to find some sort of conflict where we've seen enough to know that some recursion was happening (so that the trailing b
would cause problem) and to find a place where the recursion would differ between the two production rules.