Search code examples
regexfinite-automatadeterministic

Construct a regular expression to match the following language


I am working on a thinking exercise handed out by my professor at the end of a lecture. The problem is to construct a DFA given a specific language definition. Before I construct the DFA, the first thinking exercise is to convert the language definition into a regular expression.

The provided alphabet is binary {0, 1}

The language definition is quite informal:

The language defining the set of binary strings in which every sub-string of length 3 has as least one zero

So examples of strings that match this definition would be 000, 001, 1010 and so forth.

My trouble is coming up with a regular expression to match this language definition. I tried playing around on http://regexr.com/ but I only found that '..0' matches every three characters with a zero at the end. I'm not sure how to match every sub-string in the way the language is defined, or if it is even possible.

Is there a way to construct a regular expression for this problem?


Solution

  • Lateral thinking required. Don't implement the regex for the informal language definition, but for the property that that definition implies.

    Spoiler (hover over it for the solution):

    Hint 1:

    If any arbitrary 3-length substring must have a 0-digit, then it is impossible to have 3 digits in a row that are 1-digits.

    Hint 2:

    This means that between every 0-digit there is at most 2 of 1-digits.

    Hint 3:

    This makes it a language where after 0-2 1-digits, there comes a possibly infinite amount of groups consisting of a 0-digit and 0-2 1-digits.

    Solution:

    ^1{0,2}(01{0,2})*$, or equivalently and more mathematically, ^(11?)?(0(11?)?)*$