How would I find the language for the following regular expressions over the alphabet {a, b}?
aUb*
(ab*Uc)
ab*Ubc*
a*bc*Uac
EDIT: Before i get downvoted like crazy, i'd appreciate it if someone could show me the steps towards solving these problems, not just the solution. Maybe even walking me through one so I can do the rest on my own.
Thanks!
Edit: short answer, *
means "zero or more repetitions" in almost all regex/grammar syntaxes include perl5 and RFC 5234. *
typically binds more tightly than concatenation and alternation.
You say you want a language over the alphabet (a, b), but include c
and U
in your expressions. I'm going to assume that you want a language grammar over the alphabet (a, b, c) in a form like BNF given a regular expression where U
is a low-precedence union operator, similar to |
in perl re.
In that case,
a|b*
is equivalent to the BNF:
L := a
| <M>
M := epsilon
| b <M>
The *
operator means zero or more, so the first clause in M
is the base case, and the second clause is a recursive use of M
that includes a terminal b
.
L is just either a single terminal a
or the nonterminal M
.
(ab*|c)
L ::= a <M>
| c
M ::= epsilon
| b <M>
M is derived the same way as above, and the rest is self explanatory.
ab*|bc*
L ::= a <M>
| b <N>
M ::= epsilon
| b <M>
N ::= epsilon
| c <N>
N is derived in the same way as M above.
a*bc*|ac
The *
in most regular expression languages binds more tightly than concatenation, so this is the same as
(a*b(c*))|(ac)
which boils down to
L ::= <M> b <N>
| a c
M ::= epsilon
| a <M>
N ::= epsilon
| c <N>
In general to convert a regex to BNF, you simply use adjacency in a regex to mean adjaceny in BNF, and U
or |
in a regex means |
in BNF.
If you define a nonterminal <X> ::= x
then you can handle x*
thus:
L ::= epsilon
| <X> <L>
With the same nonterminal <X> ::= x
then you can handle x+
thus:
L ::= <X>
| <L> <X>
That gets you the repetition operators *
and +
which leaves ?
. x?
is simply
L ::= epsilon
| <X>