I am trying to write a regex query for a language with letters a,b,c such that a is never adjacent to b.
Can it be done by using only the alternation (plus), concatenation and repetition (multiplication) operators?
L = w belongs to {a,b,c}* such that a is never adjacent to b
(Lets see if I recall enough formal language theory.)
Such a regular expression could be built with help of a DFA like this:
A = aA + cC + F // only a or c can follow a
B = bB + cC + F // only b or c can follow b
C = cC + aA + bB + F // any char can follow c
Where A
, B
and C
are states representing the state when a
, b
and c
respectively was the previous character. Since any character can follow c
we can make C
our start state. F
being the final end state (end of string).
This DFA can be converted to a regular expression like this:
A = a*(cC+F) // eliminate recursion
B = b*(cC+F) // eliminate recursion
C = cC + aA + bB + F
= cC + aa*(cC+F) + bb*(cC+F) + F // substitute A and B
= (c + aa*c + bb*c)C + aa*F + bb*F + F // regroup
= (c + aa*c + bb*c)*(aa*F + bb*F + F) // eliminate recursion
= (c + aa*c + bb*c)*(aa* + bb* + e)F // regroup
So the expression would be:
(c + aa*c + bb*c)*(aa* + bb* + e) // e being the empty/null string
Or in informal regex format:
(c|a+c|b+c)*(a+|b+)?
Which can be shortened to:
(a+c|b*c)*(a*|b*)