Search code examples
logicfinite-automatadeterministicnon-deterministic

Highest number of states - DFA / NFA


I am trying to wrap my head around some operations on regular languages, such as intersection, concatenation and Kleene star (for both DFA and NFA, and how they differ).

Imagine the following:

Assume we have L_A and L_B as regular languages defined by DFAs M_A and M_B And n_A and n_B are the number of states in M_A and M_B.

Two questions stand out:

  1. What is the highest number of states you would need in DFAs for the language L_A*?

  2. What is the highest number of states you would need in NFAs for the language L_A (intersection) L_B?

ANY help/pointers/advice on how to go about solving these questions are HIGHLY valued! I have no idea how or where to start.


Solution

  • What is the highest number of states you would need in DFAs for the language L_A*?

    The "highest number of states you would need" is another way of asking for the number of equivalence classes under the indistinguishability relation of the Myhill-Nerode theorem. Suppose L_A "needs" x states (in its minimal DFA), with x less than or equal to n_A. How many states might L_A* "need"?

    Certainly there are cases where L_A* "needs" fewer states than L_A. Consider the language {0, 1} over {0, 1}; a minimal DFA for this has three states whereas a minimal DFA for {0, 1}* has one state.

    Moreover, it's not hard to imagine languages where L_A* "needs" the same number of states: for instance, when L = L*. Suppose L_A = {0, 1}. Then L_A = {0, 1}** = {0, 1}* = L and L_A* "needs" the same number of states.

    I think your question is really about cases where we need more and, specifically, how many more we could ever possibly need in the worst case. Suppose the equivalence classes of L_A are c_1, c_2, ..., c_x. Consider these to be the sets of strings that take strings in the class to a string in L_A. Then the classes of L_A* are (c_1)(L_A*) + e, (c_2)(L_A*), ..., (c_w)(L_A*). The maximum number of distinct classes is therefore w; we can't create any new classes by applying the Kleene star. But we can certainly collapse them! It's possible that (c_a)(L_A)* = (c_b)(L_A)* for a != b. Consider L_A = {0, 1}. Then c_1 = {0, 1}, c_2 = {e}, c_3 = {}. Then (c_1)(L_A)* + e = {e, 0, 1, ...}, (c_2)(L_A)* = {e, 0, 1, ...}, (c_3)(L_A)* = {}. We end up with two distinct candidates using this method, and we can pare it down further by noticing that there are no strings in the (c_3)(L_A)* equivalence class.

    But the important point is that we can never have more; so a theoretical maximum on the number of "needed" states is x, where x is "needed" number of states of L_A.

    What is the highest number of states you would need in NFAs for the language L_A (intersection) L_B?

    Let x <= n_A be the number "needed" for L_A and y <= n_B be the number "needed" for n_B. The intersection can easily result in, e.g., the empty language, so it should be clear that the "needed" states in the intersection may be much lower than x and y. It will be the same if L_A = L_B since L_A (intersection) L_B = L_A (intersection) L_A = L_A in that case.

    Note that we can never need more than x*y since we can always use the Cartesian Product Machine construction to construct a DFA with a number of states equal to the product of the numbers of states in DFAs for L_A and L_B, and a DFA is an NFA. A natural question is whether this limit is achieved for some class of languages L_A and L_B. The answer is that it is.

    Consider L_A = {a^nk} and L_B = {a^mk} where n and m are relatively prime and greater than one. Then L_A (intersection) L_B = {a^(nm)}. A minimal DFA for L_A has n states and a minimal DFA for L_B has m states. A minimal DFA for L_A (intersection) L_B has nm states. None of these DFAs have corresponding equivalent NFAs with fewer states. The DFA for a^2k has transition table:

    q    e    q'
    q0   a    q1
    q1   a    q2
    

    With q0 accepting. Therefore the (achievable) maximum is xy <= n_An_B.