Search code examples
regexpattern-matchingregular-languagefsmdfa

Given a sequence of symbols, how to find the minimal DFA that can accept it?


For example: Given the below symbol sequence,

a b c b c d d d b c b c d d d d e

The simplest DFA that can accept it is a chain of 17 states.

While the below regular expression can derive the above sequence:

a (b c)* (d)* (b c)* (d)* e

And the corresponding minimal DFA has 8 states.

Further, the regular expression a ((b c)* (d)*)* e has even smaller minimal DFA with 4 states. And it can accept the example sequence.

In the above example, I only considers the * operator; More general, operator | can also be considered to reduce the DFA size.

So, the general question is:

Given a sequence of symbols, how to find the minimal DFA that can accept it?


Solution

  • Easy. A DFA with one state can always do it. That one state is a start state, accepting state, and all symbol transitions loopback into it. That trivial DFA accepts all strings (∑*), and is definitely the smallest-possible DFA.