For example: Given the below symbol sequence,
a b c b c d d d b c b c d d d d e
The simplest DFA that can accept it is a chain of 17 states.
While the below regular expression can derive the above sequence:
a (b c)* (d)* (b c)* (d)* e
And the corresponding minimal DFA has 8 states.
Further, the regular expression a ((b c)* (d)*)* e
has even smaller minimal DFA with 4 states. And it can accept the example sequence.
In the above example, I only considers the *
operator; More general, operator |
can also be considered to reduce the DFA size.
So, the general question is:
Given a sequence of symbols, how to find the minimal DFA that can accept it?
Easy. A DFA with one state can always do it. That one state is a start state, accepting state, and all symbol transitions loopback into it. That trivial DFA accepts all strings (∑*), and is definitely the smallest-possible DFA.