Search code examples
compiler-constructioncontext-free-grammar

How to find FIRST and FOLLOW


I want to find FIRST and FOLLOW for the following CFG:

S -> O | M
M -> iEtMeM | a
O -> iEtS | iEtMeO
E -> b

I did the left factorization, so I get:

S -> O | M
M -> iEtMeM | a
O -> iEtO'
O'-> S | MeO
E -> b

The FIRST sets:

FIRST(S) = FIRST(O)|FIRST(M) = {i,a}
FIRST(M) = {i,a}
FIRST(O) = {i}
FIRST(O') = FIRST(S)|FIRST(M) = {i,a}
FIRST(E) = {b}

And now I cant find the FOLLOW set for S , because I don't know what FOLLOW(O') is:

FOLLOW(S) = {$, FOLLOW(O')}

Solution

  • Actually FOLLOW(S) = {$} only.

    So, I overlooked that S is mentioned on a right-hand side. Corrections below:

    First we get the augmented grammar by adding the production S' ->S$, then FOLLOW(S') = {$}.

    Then we have

    • from S' -> S$ and O' -> S

      FOLLOW(S) = FIRST($) + FOLLOW(O')

    • from M -> iEtMeM, O' -> MeO, and S -> M

      FOLLOW(M) = FIRST(eM) + FIRST(eO) + FOLLOW(S)

    • from S -> O and O' -> MeO

      FOLLOW(O) = FOLLOW(S) + FOLLOW(O')

    • from O -> iEtO'

      FOLLOW(O') = FOLLOW(O)

    • from M -> iEtMeM and O -> iEtO'

      FOLLOW(E) = FIRST(tMeM) + FIRST(tO')

    The "problem" are the mutually recursive definitions for FOLLOW(S), FOLLOW(O), and `FOLLOW(O') - that means that each of these follow sets is a subset of the others, hence they are equal.

    If one represent the set inclusion constraints, imposed by the above equations, as a graph (with non-terminal symbols as nodes), each set of mutually recursive definitions forms a strongly-connected component. Replacing each SCC with a new node will result in a DAG, representing a set of non-recursive equations, which can then by "evaluated" in topological order.

    Say, we replace nodes, corresponding to symbols S, O and O' with node N. The equations become:

    FOLLOW(N) = FIRST($) + FOLLOW(N)
    FOLLOW(M) = FIRST(eM) + FIRST(eO) + FOLLOW(N)
    FOLLOW(N) = FOLLOW(N) + FOLLOW(N)
    FOLLOW(N) = FOLLOW(N)
    FOLLOW(E) = FIRST(tMeM) + FIRST(tO')
    

    and by cutting off the redundant parts:

    FOLLOW(N) = FIRST($) = {$}
    FOLLOW(M) = FIRST(eM) + FIRST(eO) + FOLLOW(N) = {e, $}
    FOLLOW(E) = FIRST(tMeM) + FIRST(tO') = {t}
    

    and, since N stands for either S, O, or O' we get:

    FOLLOW(S`) = FOLLOW(S) = FOLLOW(O) = FOLLOW(O`) = {$}
    FOLLOW(M) = {e, $}
    FOLLOW(E) = {t}