Search code examples
grammarcontext-free-grammarchomsky-normal-form

Converting grammar to Chomsky Normal Form?


Convert the grammar below into Chomsky Normal Form. Give all the intermediate steps.

S -> AB | aB
A -> aab|lambda
B -> bbA

Ok so the first thing I did was add a new start variable S0

so now I have

S0 -> S
S -> AB | aB
A -> aab|lambda
B -> bbA

then I removed all of the lambda rules:

S0 -> S
S -> AB | aB | B
A -> aab
B -> bbA | bb

Then I checked for S->S and A->B type rules which did not exist. And that was the answer I came up with, do I need to do anything further or did I do anything wrong?


Solution

  • Wikipedia says:

    In computer science, a context-free grammar is said to be in Chomsky normal form if all of its production rules are of the form:

    • A -> BC, or
    • A -> α, or
    • S -> ε

    where A, B, C are nonterminal symbols, α is a terminal symbol, S is the start symbol, and ε is the empty string. Also, neither B nor C may be the start symbol.

    Continuing your work:

    S0 -> S
    S -> AB | aB | B
    A -> aab
    B -> bbA | bb
    

    Instead of using | to denote different choices, split a rule into multiple rules.

    S0 -> S
    S -> AB
    S -> aB
    S -> B
    A -> aab
    B -> bbA
    B -> bb
    

    Create new rules Y -> a and Z -> b because we will need them soon.

    S0 -> S
    S -> AB
    S -> aB
    S -> B
    A -> aab
    B -> bbA
    B -> bb
    Y -> a
    Z -> b
    

    S -> aB is not of the form S -> BC because a is a terminal. So change a into Y:

    S0 -> S
    S -> AB
    S -> YB
    S -> B
    A -> aab
    B -> bbA
    B -> bb
    Y -> a
    Z -> b
    

    Do the same for the B -> bb rule:

    S0 -> S
    S -> AB
    S -> YB
    S -> B
    A -> aab
    B -> bbA
    B -> ZZ
    Y -> a
    Z -> b
    

    For A -> aab, create C -> YY; for B -> bbA, create D -> ZZ:

    S0 -> S
    S -> AB
    S -> YB
    S -> B
    A -> CZ
    C -> YY
    B -> DA
    D -> ZZ
    B -> ZZ
    Y -> a
    Z -> b
    

    For S -> B, duplicate the one rule where S occurs on the right hand side and inline the rule:

    S0 -> B
    S0 -> S
    S -> AB
    S -> YB
    A -> CZ
    C -> YY
    B -> DA
    D -> ZZ
    B -> ZZ
    Y -> a
    Z -> b
    

    Deal with the rules S0 -> B and S0 -> S by joining the right hand side to the left hand sides of other rules. Also, delete the orphaned rules (where the LHS symbol never gets used on RHS):

    S0 -> DA
    S0 -> ZZ
    S0 -> AB
    S0 -> YB
    A -> CZ
    C -> YY
    B -> DA
    D -> ZZ
    B -> ZZ
    Y -> a
    Z -> b
    

    And we're done. Phew!