Search code examples
grammarformal-languages

Can someone help me Convert Grammar to regular expresion


Can someone help me convert grammar to regular expression and explain how to do it on some complex grammars?

S -> aA | bB
A -> aC | bC | a | b
B -> aC | bC | a | b
C -> aA

Solution

  • This grammar of yours:

    S -> aA | bB
    A -> aC | bC | a | b
    B -> aC | bC | a | b
    C -> aA
    

    Happens to be a right-regular grammar. As such, it describes a regular language and can easily be converted to a deterministic finite automaton:

    q    s    q'
    S    a    A
    S    b    B
    A    a    (C)
    A    b    (C)
    B    a    (C)
    B    b    (C)
    (C)  a    A
    (C)  b    [D]
    [D]  a    [D]
    [D]  b    [D]
    

    Here, S is the initial state, C is accepting and D is dead. To get a regular expression, treat this automaton as a NFA and start replacing symbol labels with regular expressions eliminating states:

    S = aA + bB
    A = aC + bC + a + b
    B = aC + bC + a + b
    C = aA + bD
    D = aD + bD
    

    Notice that D = aD + bD = (a + b)D is only true if D = {}. Rewriting:

    S = aA + bB
    A = aC + bC + a + b
    B = aC + bC + a + b
    C = aA
    

    Now we can safely eliminate C by substitution:

    S = aA + bB
    A = aaA + baA + a + b = (aa + ba)A + (a + b)
    B = aaA + baA + a + b = (aa + ba)A + (a + b)
    

    Now we can use the rule Z = xZ + y => Z = x*y:

    S = aA + bB
    A = (aa + ba)*(a + b)
    B = aaA + baA + a + b = (aa + ba)A + (a + b)
    

    Now substitute:

    S = aA + bB
    A = (aa + ba)*(a + b)
    B = (aa + ba)(aa + ba)*(a + b) + (a + b)
    

    Now again:

    S = a(aa + ba)*(a + b) + b((aa + ba)(aa + ba)*(a + b) + (a + b))
    

    Now we can group like terms:

    S = (a(aa + bb)* + b(aa + ba)(aa + ba)* + b)(a + b)
    

    Notice that b(aa + ba)(aa + ba)* + b can be factored b((aa + ba)(aa + ba)* + e) and that (aa + ba)(aa + ba)* + e = (aa + ba)*:

    S = (a(aa + bb)* + b(aa + ba)*)(a + b)
    

    We can factor again:

    S = (a + b)(aa + ba)*(a + b)
    

    This last expression is a good, concise, correct regular expression for your language. It basically encodes the idea "see one symbol, then see another symbol, but if you see a third symbol it had better not be a b".