Search code examples
regexformal-languages

Formal regular expression for a language over a,b,c such that a is never adjacent to b


I am trying to write a regex query for a language with letters a,b,c such that a is never adjacent to b.

Can it be done by using only the alternation (plus), concatenation and repetition (multiplication) operators?

L = w belongs to {a,b,c}* such that a is never adjacent to b


Solution

  • (Lets see if I recall enough formal language theory.)

    Such a regular expression could be built with help of a DFA like this:

    A = aA + cC + F      // only a or c can follow a
    B = bB + cC + F      // only b or c can follow b
    C = cC + aA + bB + F // any char can follow c
    

    Where A, B and C are states representing the state when a, b and c respectively was the previous character. Since any character can follow c we can make C our start state. F being the final end state (end of string).

    This DFA can be converted to a regular expression like this:

    A = a*(cC+F) // eliminate recursion
    B = b*(cC+F) // eliminate recursion
    
    C = cC + aA + bB + F
      = cC + aa*(cC+F) + bb*(cC+F) + F       // substitute A and B
      = (c + aa*c + bb*c)C + aa*F + bb*F + F // regroup
      = (c + aa*c + bb*c)*(aa*F + bb*F + F)  // eliminate recursion
      = (c + aa*c + bb*c)*(aa* + bb* + e)F   // regroup
    

    So the expression would be:

    (c + aa*c + bb*c)*(aa* + bb* + e) // e being the empty/null string
    

    Or in informal regex format:

    (c|a+c|b+c)*(a+|b+)?
    

    Which can be shortened to:

    (a+c|b*c)*(a*|b*)