I am working on a thinking exercise handed out by my professor at the end of a lecture. The problem is to construct a DFA given a specific language definition. Before I construct the DFA, the first thinking exercise is to convert the language definition into a regular expression.
The provided alphabet is binary {0, 1}
The language definition is quite informal:
The language defining the set of binary strings in which every sub-string of length 3 has as least one zero
So examples of strings that match this definition would be 000
, 001
, 1010
and so forth.
My trouble is coming up with a regular expression to match this language definition. I tried playing around on http://regexr.com/ but I only found that '..0' matches every three characters with a zero at the end. I'm not sure how to match every sub-string in the way the language is defined, or if it is even possible.
Is there a way to construct a regular expression for this problem?
Lateral thinking required. Don't implement the regex for the informal language definition, but for the property that that definition implies.
Spoiler (hover over it for the solution):
Hint 1:
If any arbitrary 3-length substring must have a
0
-digit, then it is impossible to have 3 digits in a row that are1
-digits.
Hint 2:
This means that between every
0
-digit there is at most 2 of1
-digits.
Hint 3:
This makes it a language where after 0-2
1
-digits, there comes a possibly infinite amount of groups consisting of a0
-digit and 0-21
-digits.
Solution:
^1{0,2}(01{0,2})*$
, or equivalently and more mathematically,^(11?)?(0(11?)?)*$