Search code examples
regular-languagensregularexpressionautomatadfaautomata-theory

Regular expression over the language C={a,b}


Good evening everyone,i'm getting stuck with the following regular expression,

I think there is a much easier approach to the expression than mine,

I had to writing down the regular expression and the dfa that from the alphabet {a,b} accepted all the strings that start and end with b and have an even number of a.

My attempt was going into cases, but the outcome, wasn't to great :

I tried something like this : b b* (aba)* (aab)* (aa)* (aab)* (aba)* b*b

But i think this is not complete.

Should i follow some general rule to achieve this task? Or i just need to practice regular expressions ?

Thanks,any tip or help would be appreciated.


Solution

  • A DFA seems easier to make here, so we can start there and derive the regular expression from there.

    We will need at least one, initial, state. This state cannot be accepting because the empty string does not start and end with b. We'll call this q0.

    If we see an a in this state, we are looking at a string that does not start with b, so we cannot accept it no matter what comes next. We can represent this with a new, dead, state. We'll call this q1.

    If we see a b in q0, we need a new state to represent the fact that we are well on the way to seeing a string that meets the criteria. Indeed, the string b starts and ends with a b and has an even number of a (zero is even); so this state must be accepting. Call this q2.

    If we see an a in q2 then we have an odd number of as and did not last see a b, so we cannot accept the string. However, it's still possible to accept a string from this state by seeing an odd number of as followed by at least one b. Call the state to represent this q3.

    If we see a b in q2, we're in the same situation as before (even number of a and last saw a b, so we can accept). Stay in q2.

    If in q3 and we see an a, we now have an even number of a again and just need a b. Call this new state q4. If we see a b, we still need an a so we might as well stay in q3.

    If in q4 and we see an a, we again need more as and can return to q3. If, on the other hand, we get a b, we can return to q2 since the string is in our language.

    The DFA looks like this:

     q     s    q'
    --    --    --
    q0     a    q1        q0: initial state
    q0     b    q2        q1: dead state, did not begin with b
    q1     a    q1        q2: accepting state, even #a and start/stop with b
    q1     b    q2        q3: start with b, odd #a
    q2     a    q3        q4: start with b, even #a, stop with a
    q2     b    q2
    q3     a    q4
    q3     b    q3
    q4     a    q3
    q4     b    q2
    

    To get the regular expression we can find regular expressions leading to each state, iteratively, and then take the union of regular expressions for accepting states. In this case, only q2 is accepting so all we need is a regular expression for that state. We proceed iteratively, substituting at each stage.

    round 0
    (q0): e
    (q1): (q0)a + (q1)(a+b)
    (q2): (q0)b + (q2)b + (q4)b
    (q3): (q2)a + (q3)b + (q4)a
    (q4): (q3)a
    
    round 1
    (q0): e
    (q1): a + (q1)(a+b) = a(a+b)*
    (q2): b + (q2)b + (q4)b = (b+(q4)b)b*
    (q3): (q2)a + (q3)b + (q4)a = ((q2)+(q4))ab*
    (q4): (q3)a
    
    round 2
    (q0): e
    (q1): a(a+b)*
    (q2): (b+(q3)ab)b*
    (q3): ((q2)+(q3)a)ab* = (q2)ab* + (q3)aab* = (q2)ab*(aab*)*
    (q4): (q3)a
    
    round3:
    (q0): e
    (q1): a(a+b)*
    (q2): (b+(q3)ab)b*
    (q3): (b+(q3)ab)b*ab*(aab*)* = bb*ab*(aab*)*+(q3)abb*ab*(aab*)* = bb*ab*(aab*)*(abb*ab*(aab*)*)*
    (q4): (q3)a
    
    round4:
    (q0): e
    (q1): a(a+b)*
    (q2): (b+bb*ab*(aab*)*(abb*ab*(aab*)*)*ab)b*
    (q3): bb*ab*(aab*)*(abb*ab*(aab*)*)*
    (q4): bb*ab*(aab*)*(abb*ab*(aab*)*)*a
    

    Therefore, a regular expression is this:

    r = (b+bb*ab*(aab*)*(abb*ab*(aab*)*)*ab)b*
      = bb* + bb*ab*(aab*)*(abb*ab*(aab*)*)*abb*
    
    1. The bb* part encodes the fact that any string of b is a string in the language.
    2. the other part begins and ends with bb* which encodes the fact that any string must begin and end with b
    3. The outermost as encode the fact that any string in the language with a must have both a first and a last a
    4. The aab* parts allow there to be contiguous pairs of a
    5. The abb*ab* part allow allows there to be non-contiguous pairs of a

    As a final note, the rules for replacing expressions as above are the following:

    A: r       r is an expression
    B: As      s is an expression
    =
    A: r
    B: rs
    
    
    A: r + As  r, s are expressions
    =
    A = rs*