Search code examples
regular-languagefinite-automata

Build regular expression


I need to build a regular expression for strings in which the +/- characters cannot stand side by side (must be separated by some other). I got this option: (a*(+|-)a)* , where a is any character, * is the Kleene closure, () is for clarity. But this expression does not recognize lines of the form: "+", "-","+ a-" etc. Maybe someone will be able to move me from the dead point. I need regularity to build a finite automaton.


Solution

  • That might do:

    ^(\+(?![\+-])|-(?![\+-])|[^\+-])*?$
    

    It bounds everything to the single line: ^ to $. The lazy quantifier ()*? ensures that no more then one line is going to be recognized.

    The 3 concatenations inside the parentheses are as follows:

    • \+(?![\+-]) if the character is + the next one must not be + or -
    • -(?![\+-]) if the character is - the next one must not be + or -
    • the previous two have the second character only looked-ahead, and could be combined into one concatenation: [\+-](?![\+-]).
    • [^\+-] any character that is not + and -

    However, you must know that a regex is more powerful than a regular expression. You need a regular grammar more than a regular expression:

    S = +T
    S = -T
    S = @S
    S = ε
    
    T = @S
    T = ε
    

    This grammar is right-regular, and @ is any character that is not + nor -. ε is epsilon = nothing.

    Here is the deterministic finite automaton (where P=+ M=- @=not + and not -):

      /---- @ ----\
      |           |
      |           v
    (S = start/final)--- P,M -->(T = final)---- P,M --->(error)
          ^                          |         
          |                          |         
          \----------- @ ------------/