subtraction automata finite-automata computation-theory dfa

Solving and proving a DFA that subtracts and takes the modulus of two elements

I'm in a theory of computation class and one of the proofs we've been asked to write has the following parameters:

L = {a, b}, x ε {a, b}*, and c ε {a, b}. #c(x) is defined as the number of occurrences of character c in x.

A = {x | #a(x)-#b(x) = 0 mod 3}.

I'm pretty clear on drawing DFAs in general, but getting tripped up by the inclusion of subtraction. When proving something like this, is it better to use cases only, or try and actually draw a diagram? I'm not sure where I'd start with actually drawing here... would it involve multiple separate flows?

Any help is appreciated!

Solution

I would recommend that, in cases like this, you use the Myhill-Nerode theorem directly, either to find the number of states in a minimal DFA for the language, or to discover that the language is non-regular because it would require infinitely many states.

If you've never heard of the Myhill-Nerode theorem, you might have instead heard of the indistinguishability relation on strings with respect to a language. Two strings x and y are indistinguishable with respect to a language L if for any string w such that xw is in L, yw is also in L, and vice versa. Strings that are indistinguishable belong to equivalence classes with respect to the indistinguishability relation, and these classes correspond directly to states of a minimal DFA for the language (if there are finitely many equivalence classes, that is!).

To use this, we start checking strings in increasing lexicographical order until we find that for strings of length k, we have introduced no new equivalence classes w.r.t. indistinguishability (or we recognize some pattern which tells us that there will be infinitely many equivalence classes).

The empty string can be followed by any string in L to get a string in L (trivial). We will always have some equivalence class for the empty string. This is equivalent to saying any DFA must have at least one state - the initial state. Note: the empty string is in our language since 0 - 0 = 0 (mod 3). The state corresponding to this equivalence class will be accepting, if we get a DFA. Call the equivalence class and state [e].
The string a cannot be followed by any string in the language (1 - 0 != 0 (mod 3)), so we already know that the equivalence class must be different from [e]. The strings that can follow this are like those that can follow the empty string, except these must satisfy #a(x) - #b(x) = 2 (mod 3).
The string b cannot be followed by any string in the language (0 - 1 != 0 (mod 3)), so we know this is a new class [b]. In fact, the strings that follow the string b must satisfy #a(x) - #b(x) = 1 (mod 3) if they are to lead the string b to one in the language.
The string aa cannot be followed by any string in the language (2 - 0 != 0 (mod 3)), so we know this is not the class [e]. However, a moment's reflection will show that this string is indistinguishable from the string b in the sense that any string which takes the string aa to a string in the language also takes the string b to a string in the language, and vice versa. Consider: (aa)a, (aa)bb, …; versus: (b)a, (b)bb, … . Since aa is indistinguishable from b, we do not add a new equivalence class or state for it.
The string ab is indistinguishable from the empty string in that any string in L can be added to it to get another string in L. No new equivalence class need be defined.
The string ba is indistinguishable from ab; again, no new equivalence class is required.
The string bb is indistinguishable from the string a: (bb)b, (bb)aa, …; versus (a)b, (a)aa, …

While considering strings of length two, we did not need to introduce any new equivalence classes; this means that we are done and the language we are considering is regular. It has states corresponding to equivalence classes [e], [a] and [b]. Moreover, we can get the transitions by seeing what state corresponds to the equivalence class for the concatenation of the state's representative string (empty, a or b) and the symbol in the transition.

Concatenating a onto the empty string gives the string a, so there is a transition on symbol a from [e] to [a] since a = e.a;
Concatenating b onto the empty string gives the string b, so there is a transition on symbol b from [e] to `[b] since b = e.b;
Concatenating a onto a gives the string aa, and aa is indistinguishable from b; so, there is a transition on symbol a from [a] to [b] since aa = a.a and aa ~ b;
Concatenating b onto a gives the string ab, and ab is indistinguishable from the empty string; so, there is a transition on symbol b from [a] to [e] since ab = a.b and ab ~ e;
Concatenating a onto b gives the string ba, and ba is indistinguishable from the empty string; so, there is a transition on symbol a from [b] to [e] since ba = b.a and ba ~ e;
Concatenating b onto b gives the string bb, and bb is indistinguishable from the string a; so, there is a transition on symbol b from [b] to [a] since bb = b.b and bb ~ a.

The resulting DFA looks like this:

           b
       /--------\
       |        |
       |  /--->[a]
       V / a   | ^
----->[e]     a| |b
       ^ \ b   V |
       |  \--->[b]
       |        |
       \--------/
           a