Techniques for simplifying a regular expression by hand

I'm trying to prove that the RE

s(bs)*[(abb*a*)*ab]*aa(a∪b) where s=b*a*ab

can be simplified to

(a∪b)*abaa(a∪b),

but all the general transformations I know such as (ab)*a=a(ba)* or (a*b)*a*=(a∪b)* seem to have no effect.

So the questions are:

Is a step-by-step transformation process here viable? If so, can you show me how it is done?
If other types of technique are needed for the proof, what are they?
Is there a clear & comprehensive reference explaining techniques used in simplifying formal RE's?

Thanks.

Solution

There are lots of ways to do this.

One simple way to do it is to convert the regular expressions to NFAs. To see if two NFAs recognize the same language:

Consider the starting state of both NFAs.
Take the epsilon closure of the states you are considering for each NFA.
If the set of states for one NFA contains at least one accepting state but the set for the other NFA contains no accepting states, then the NFAs are not equal and you are done.
Otherwise, for each symbol, follow that symbol to get a new set of states for each NFA, and go back to step 1.

This will take a finite number of steps because there are a finite number of sets of states to consider.

You could also convert both regular expressions to minimal DFAs and show that they are isormorphic, but this technique is really just the same thing as the above technique but with the steps in a different order.

Illustration

Consider the REs aa* and a*a. It's obvious that they recognize the same language, but we'll use them to illustrate the algorithm.

NFAs

First we consider the starting states for each NFA. These are {1} for the left RE and {1} for the right RE, which we will write as {1},{1}.
Then we take the epsilon closure, which is {1},{1,2}.
Neither set contains an ending state, so we continue.
The outgoing symbols are only a, so we take all a transitions to get {2},{1,3}.
The epsilon closure is {2,3},{1,2,3}.
Both contain ending states, so we continue.
Again, the outgoing symbols are a, so we take all a transitions to get {2},{1,3}.
The epsilon closure is {2,3},{1,2,3}. This is the same as step #5, so we don't do it again.

Since there is nothing else to examine, we are done, and the two have been proven identical.

Techniques for simplifying a regular expression **by hand**

Illustration

Techniques for simplifying a regular expression by hand