Search code examples
parsingsyntaxcompiler-constructiongrammarproduction

How to deduce the Followers of non-terminals in syntax productions?


Here is a set of Syntax Productions:

S -> SA | T
A -> +S | S | *
T -> (S) | a 

when eliminate left recursion, I got these:

S -> TB
B -> AB | ε
A -> +S | TB | *
T -> (S) | a

and then I try to get the Firsts and Follows of the non-terminals in it following the steps the dragon book described. I have get Firsts correctly:

first(T) = [(, a]
first(A) = [+, *, (, a]
first(B) = [ε, +, *, (, a]
first(S) = [(, a]

But I can't figure out how get Follows. So can any one show how to do this specifically?


Solution

  • To compute the FOLLOW(Non-Terminal), apply the following rules until nothing can be added to the FOLLOW set : - // Taken from Dragon Book

    1. Place $ in FOLLOW(Start_Symbol),where $ is the input right endmarker.
    2. If there is a production A->XBY, then everything in FIRST(Y) except ε is in FOLLOW(B).
    3. If there is a production A->XB, or a production A->XBY, where FIRST(Y) contains ε,then everything in FOLLOW(A) is in FOLLOW(B).

    So,now going as per the rules, we derive the FOLLOW() of all the non-terminals here .

    FOLLOW(S) = {),$} Since S is the start symbol, FOLLOW(S) must contain $.The production number 4th body (S) explains why the right parentheses is in FOLLOW(S).

    FOLLOW(B) = {),$} Since B appears only at the ends of bodies of S-productions(and not A-productions,discussed later in DISCUSSION Section).

    FOLLOW(A) = {+,*,(,a,),$} Since A appears in bodies only followed by B, thus,everything except ε that is in FIRST(B) must be in FOLLOW(A). However, since FIRST(B) contains ε, and B is derived from S in the initial production, hence, everything in FOLLOW(S) must also be in FOLLOW(A).That explains the symbols $ and ).

    FOLLOW(T) = {+,*,(,a,),$} Since T appears in bodies only followed by B, thus,everything except ε that is in FIRST(B) must be in FOLLOW(T). However, since FIRST(B) contains ε,and B is the entire-string following T in the bodies of the S productions,everything in FOLLOW(S) must also be in FOLLOW(T).That explains the symbols $ and ).

    DISCUSSION :-

    Now,talking about some kind of conflict arising because of 3rd production,A -> +S | TB | *, you could've easily substituted S in place of TB, because of which you may be confused.

    As per me, you should leave the productions as :-

    S -> TB
    B -> AB | ε
    A -> +S | S | *  //substitute S for understanding FOLLOW(B) without any confusion
    T -> (S) | a