context-free-grammar shift-reduce-conflict

Is it possible to construct an LR(0) state machine for a recursive grammar?

Consider the following grammar:

S -> aA
A -> Aa|b

This grammar generates an infinite language. Is it possible to construct an LR(0) state machine for this grammar?

Solution

Grammars don't have "a" LR(0) state. You can construct an LR(0) state machine for the grammar; that machine has several states.

Here's an LR(0) state machine for your grammar, produced with Grammophone:

The state machine above does not show reduction transitions, but they can be deduced from the items in each state; a state with an item with the • at the end is a reduction state (states 1, 3, 4 and 5). Since an LR(0) parser may not consult the next token to decide whether to take a reduction, the grammar is not LR(0); state 3 has both a transition and a reduction [Note 1].

Although the grammar is not LR(0), the LR(0) state machine is still important because the same state machine is used by SLR(1) and LALR(1) parsers, which have exactly the same states. Constructing SLR(1) and LALR(1) parsers therefore starts with the construction of the LR(0) state machine. The difference is that SLR(1) and LALR(1) parsers do use (1) lookahead symbol to determine the reduction action. With either of these algorithms, the conflict in state 3 will be resolved because the reduction is associated only with a lookahead of b, for which there is no transition in the state machine.

Canonical LR(1) parsers do not use the same state machine (in most cases); in a CLR parser, it's possible to have two states with the same set of items. That sometimes allows a conflict to be resolved. But in this grammar it is not necessary.

Notes

A language can only be LR(0) if it has the prefix property; in other words, that no sentence is a prefix of another sentence. (It might have been better to call this the "no prefix property", but that's not so easy to talk about.) The easiest way to give a language the prefix property is to add an end-of-input marker to every input (usually symbolised $). The end-of-input marker must be a new symbol which doesn't appear anywhere in the grammar. Since every sentence in the new language ends with a $ and no sentence contains a $ other than at the end, it's impossible for a sentence to be the prefix of another sentence.

So to make this grammar LR(0), it's only necessary to change the rule S -> a A to S -> a A $. This resolves the shift-reduce conflict in state 3, and still produces an infinite language.