Can someone help me convert grammar to regular expression and explain how to do it on some complex grammars?
S -> aA | bB
A -> aC | bC | a | b
B -> aC | bC | a | b
C -> aA
This grammar of yours:
S -> aA | bB
A -> aC | bC | a | b
B -> aC | bC | a | b
C -> aA
Happens to be a right-regular grammar. As such, it describes a regular language and can easily be converted to a deterministic finite automaton:
q s q'
S a A
S b B
A a (C)
A b (C)
B a (C)
B b (C)
(C) a A
(C) b [D]
[D] a [D]
[D] b [D]
Here, S is the initial state, C is accepting and D is dead. To get a regular expression, treat this automaton as a NFA and start replacing symbol labels with regular expressions eliminating states:
S = aA + bB
A = aC + bC + a + b
B = aC + bC + a + b
C = aA + bD
D = aD + bD
Notice that D = aD + bD = (a + b)D
is only true if D = {}
. Rewriting:
S = aA + bB
A = aC + bC + a + b
B = aC + bC + a + b
C = aA
Now we can safely eliminate C
by substitution:
S = aA + bB
A = aaA + baA + a + b = (aa + ba)A + (a + b)
B = aaA + baA + a + b = (aa + ba)A + (a + b)
Now we can use the rule Z = xZ + y => Z = x*y
:
S = aA + bB
A = (aa + ba)*(a + b)
B = aaA + baA + a + b = (aa + ba)A + (a + b)
Now substitute:
S = aA + bB
A = (aa + ba)*(a + b)
B = (aa + ba)(aa + ba)*(a + b) + (a + b)
Now again:
S = a(aa + ba)*(a + b) + b((aa + ba)(aa + ba)*(a + b) + (a + b))
Now we can group like terms:
S = (a(aa + bb)* + b(aa + ba)(aa + ba)* + b)(a + b)
Notice that b(aa + ba)(aa + ba)* + b
can be factored b((aa + ba)(aa + ba)* + e)
and that (aa + ba)(aa + ba)* + e = (aa + ba)*
:
S = (a(aa + bb)* + b(aa + ba)*)(a + b)
We can factor again:
S = (a + b)(aa + ba)*(a + b)
This last expression is a good, concise, correct regular expression for your language. It basically encodes the idea "see one symbol, then see another symbol, but if you see a third symbol it had better not be a b".