The language is as follows.
∑={a,b,c}
L = the second and third to last characters of ω are the same, ω has length greater than 5 and ω contains ccc.
I tried doing it and I'm not sure if it's right. Got the following:
((ccc)(aUbUc)*(a)(a)(aUbUC)) U ((ccc)(aUbUc)*(b)(b)(aUbUC)) U ((ccc)(aUbUc)*(c)(c)(aUbUC))
Is this correct?
Nope, that's not correct. For example, your expression does not recognize the string aaccc
, because all of your subexpressions require that your strings start with ccc
, which is different from what the language description would indicate.
Some of your expression is right, such as needing to split out the aa
, bb
and cc
parts. You are overusing parentheses a bit, but that's just a matter of taste rather than correctness.
Your basic unit is (aUbUc)
, representing ∑
. A string must contain ccc
somewhere in it, so let's start with just that:
(aUbUc)*ccc(aUbUc)*
This is all strings containing ccc
. The next requirement is complicated: the third-from-last and second-from-last characters have to be the same. This could overlap with the ccc
part. If it didn't, this would be sufficient:
(aUbUc)*ccc(aUbUc)*(aaUbbUcc)(aUbUc)
But this doesn't allow us to have a string like accca
or aaccc
. Note, however, that it does require all strings to be at least length 6, so it satisfies the requirement that strings be of length greater than 5. I'm going to abbreviate and use ∑
instead of (aUbUc)
to make this smaller:
(∑*ccc∑*(aaUbbUcc)∑)U(∑∑∑∑*ccc)U(∑∑∑*ccc∑)
Note the extra ∑
s, which are required to pad out the other sub-expressions to make sure all paths have string length greater than 5.
An alternative to coming up with a regular expression directly is to build a DFA that matches this language, and then convert it to a regular expression. You'll find similar concerns when building your DFA, such as needing to make sure that you cover the overlap case, where ccc
is near the end of the string. To make that a bit easier, you can start with an NFA and then convert your NFA into a DFA.