Search code examples
parsingcontext-free-grammarregex-lookaroundsdfalr-grammar

LR(1) Item DFA - Computing Lookaheads


I have trouble understanding how to compute the lookaheads for the LR(1)-items.

Lets say that I have this grammar:

S -> AB
A -> aAb | a
B -> d

A LR(1)-item is an LR(0) item with a lookahead. So we will get the following LR(0)-item for state 0:

S -> .AB , {lookahead} 
A -> .aAb,  {lookahead} 
A -> .a,  {lookahead}

State: 1

A ->  a.Ab, {lookahead} 
A ->  a. ,{lookahead} 
A -> .aAb ,{lookahead} 
A ->.a ,{lookahead}

Can somebody explain how to compute the lookaheads ? What is the general approach ?

Thank you in advance


Solution

  • The lookaheads used in an LR(1) parser are computed as follows. First, the start state has an item of the form

    S -> .w  ($)
    

    for every production S -> w, where S is the start symbol. Here, the $ marker denotes the end of the input.

    Next, for any state that contains an item of the form A -> x.By (t), where x is an arbitrary string of terminals and nonterminals and B is a nonterminal, you add an item of the form B -> .w (s) for every production B -> w and for every terminal in the set FIRST(yt). (Here, FIRST refers to FIRST sets, which are usually introduced when talking about LL parsers. If you haven't seen them before, I would take a few minutes to look over those lecture notes).

    Let's try this out on your grammar. We start off by creating an item set containing

    S -> .AB ($)
    

    Next, using our second rule, for every production of A, we add in a new item corresponding to that production and with lookaheads of every terminal in FIRST(B$). Since B always produces the string d, FIRST(B$) = d, so all of the productions we introduce will have lookahead d. This gives

    S -> .AB ($)
    A -> .aAb (d)
    A -> .a (d)
    

    Now, let's build the state corresponding to seeing an 'a' in this initial state. We start by moving the dot over one step for each production that starts with a:

    A -> a.Ab (d)
    A -> a. (d)
    

    Now, since the first item has a dot before a nonterminal, we use our rule to add one item for each production of A, giving those items lookahead FIRST(bd) = b. This gives

    A -> a.Ab (d)
    A -> a. (d)
    A -> .aAb (b)
    A -> .a (b)
    

    Continuing this process will ultimately construct all the LR(1) states for this LR(1) parser. This is shown here:

    [0]
    S -> .AB  ($)
    A -> .aAb (d)
    A -> .a   (d)
    
    [1]
    A -> a.Ab (d)
    A -> a.   (d)
    A -> .aAb (b)
    A -> .a   (b)
    
    [2]
    A -> a.Ab (b)
    A -> a.   (b)
    A -> .aAb (b)
    A -> .a   (b)
    
    [3]
    A -> aA.b (d)
    
    [4]
    A -> aAb. (d)
    
    [5]
    S -> A.B  ($)
    B -> .d   ($)
    
    [6]
    B -> d.   ($)
    
    [7]
    S -> AB.  ($)
    
    [8]
    A -> aA.b (b)
    
    [9]
    A -> aAb. (b)
    

    In case it helps, I taught a compilers course last summer and have all the lecture slides available online. The slides on bottom-up parsing should cover all of the details of LR parsing and parse table construction, and I hope that you find them useful!

    Hope this helps!