It seems that most CS textbooks in Automata theory cover Regular Expressions with the alphabet Σ = {0, 1} or Σ = {a, b}.
Many students in Automata class had trouble writing RegEx's, is there a parser that accepts something like the following examples? Perl RegEx and similar syntax is too dissimilar to be useful.
Some examples:
(0+1)* # All words in the language
(0+1)((0+1)(0+1))* # All words of odd length
0(0+1)*1 # Words starting with 0 and ending with 1
0*+0*10*+0*10*10* # Has at most two 1's
(0+10)(0+1)*(1+10) # Begins with 0 or 10 and ends with 1 or 10
(1+011)* # Every 0 followed by two 1's
In the syntax used in this class and several textbooks, a *
indicates matching 0+ times, and a +
indicates OR.
Does something out there exist to do this, or should I built my own parser?
It doesn't look that different from standard regex... the only change you would have to make is to swap +
with |
(change (0+1)
to (0|1)
). Apart from that, you would just have to make the resulting regex match the entire line, either by prepending ^
and suffixing $
or by setting the appropriate option.
Shouldn't be more than a couple lines to wrap the standard regex parser.