Search code examples
context-free-grammarchomsky-normal-form

How do you do the last step when converting a CFG to a CNF?


Given:

    S->AcA|BcB
    A->ccBc|ABA|cc
    B->c

step1

    S0->S
    S->AcA|BcB
    A->ccBc|ABA|cc
    B->c


step2             // change symbol to terminals?

    S0->S
    S->ABA|BBB
    A->BBBB|ABA|BB
    B->c

step3             // split?

    S0->S
    S->ABA
    S->BBB
    A->BBBB
    A->ABA
    A->BB
    B->c

step4             // what to do when A->AXA?

    S0->S
    S->ABA
    S->BBB
    A->BBBB   //??
    A->ABA    //??
    A->BB     //??
    B->c

I'm not sure how to continue.


Solution

  • There is no step 2 for you. Step 2 is removing epsilon rules, and you have no epsilon rules.

    You don't have a step 3 either, because B->c has a terminal - not a nonterminal - on its right hand side. There are no unit rules of the form:

    Terminal -> Terminal.
    

    This leaves us after your step 1:

    S0 -> S
    S -> AcA|BcB
    A -> ccBc|ABA|cc
    B -> c
    

    You need to get the rest of your rules in the form of:

    X -> YZ //where X,Y,Z are all nonterminals
    

    To convert a string of nonterminals and terminals into the above form, you take the first nonterminal or terminal from the front, and then you turn the rest of the string into a new rule, and use that rule on the end. I didn't explain that very well, so let's look at an example.

    //To convert
    S -> AcA
    //we split it into A and cA, and define a new rule C -> cA, giving:
    S -> AC
    C -> cA
    //Then, C -> cA needs to be converted to the same form, so just replace c with B
    S -> AC
    C -> BA
    

    However, throw out the above, because first I'm going to just change all the cs to B, since that will happen anyway:

    S0 -> S
    S -> ABA|BBB
    A -> BBBB|ABA|BB
    B -> c
    

    Now that I look at your question again, this is where you were at. You'd gone one step further to:

    S0 -> S
    S -> ABA
    S -> BBB
    A -> BBBB
    A -> ABA
    A -> BB
    B -> c
    

    If we take the first S, and use the method described above, we get:

    S -> ABA
    //goes to
    S -> AC
    C -> BA
    

    Doing the rest of the rules, we get:

    //second S
    S -> BD
    D -> BB
    
    //first A
    A -> DD
    
    //second A:
    A -> AC
    

    Combining this all, we get:

    S0 -> S
    S -> AC
    S -> BD
    A -> DD
    A -> AC
    A -> BB
    B -> c
    C -> BA
    D -> BB