Performance characteristics of a non-minimal DFA

Much has been written about the performance of algorithms for minimizing DFAs. That's frustrating my Google-fu because that's not what I'm looking for.

Can we say anything generally about the performance characteristics of a non-minimal DFA? My intuition is that the run time of a non-minimal DFA will still be O(n) with respect to the length of the input. It seems that minimization would only affect the number of states and hence the storage requirements. Is this correct?

Can we refine the generalizations if we know something about the construction of the NFA from which the DFA was derived? For example, say the NFA was constructed entirely by applying concatenation, union, and Kleene star operations to primitive automatons that match either one input symbol or epsilon. With no way to remove a transition or create an arbitrary transition, I don't think it's possible to have any dead states. What generalizations can we make about the DFAs constructed from these NFAs? I'm interested in both theoretical and empirical answers.

Solution

Regarding your first question, on the runtime of the non-optimal DFA. Purely theoretically your intuition that it should still run in O(n) is correct. However, imagine (as an example) the following pseudo-code for the Kleene-Star operator:

// given that the kleene-star operator starts at i=something
while string[i] == 'r':
    accepting = true;
    i++;
while string[i] == 'r':
    accepting = true;
    i++;
// here the testing of the input string can continue for i+1

As you can see, the first two while-loops are identical, and could be understood as a redundant state. However, "splitting" while loops will decrease (among other things) your branch-prediction accuracy and therefore the overall runtime (see Mysticial's brilliant explanation of branch prediction for more details here.

Many other, similar "practical" arguments can be made on why a non-optimal DFA will be slower; among them, as you mentioned, a higher memory usage (and in many cases, more memory means slower, for memory is - by comparison - a slower part of the computer); more "ifs", for each additional state requires input checking for its successors; possibly more loops (as in the example), which would make the algorithm slower not only on the basis of branch prediction, but simply because some programming languages are just very slow on loops.

Regarding your second question - here I am not sure on what you mean. After all, if you do the conversion properly you should derive a pretty optimal DFA in the first place.

EDIT: In the discussion the idea came up that there can be several non-minimal DFAs constructed from one NFA that would have different efficiencies (in whatever measure chosen), not in the implementation, but in the structure of the DFA. This is not possible, for there is only one optimal DFA. This is the outline of a proof for this:

Assuming that our procedure for creating and minimizing a DFA is optimal.
when applying the procedure, we will start by constructing a DFA first. In this step, we can create indefinitely many equivalent states. These states are all connected to the graph of the NFA in some way.
In the next step we eliminate all non-reachable states. This is indifferent to perfomance, for an unreachable state would correspond to "dead code" - never to be executed.
In the fourth step, we minimize the DFA by grouping equivalent states. This is where it becomes interesting - for the idea is that we can do this in different ways, resulting in different DFAs with different performance. However, the only "choice" we have is assigning a state to a different group. So, for arguments sake, we assume we could do that. But, by the idea behind the minimization algorithm, we can only group equivalent states. So if we have different choices of grouping a particular state, by transitivity of equivalence, not only would the state be equivalent to both groups, but the groups would be equivalent, too. So if we could group differently, the algorithm would not be optimal, for it would have grouped all states in the groups into one group in the first place. Therefore, the assumption that there can be different minimizations has to be wrong.