regex context-free-grammar regular-language formal-languages

How do I find the language from a regular expression?

How would I find the language for the following regular expressions over the alphabet {a, b}?

aUb*
(ab*Uc)
ab*Ubc*
a*bc*Uac

EDIT: Before i get downvoted like crazy, i'd appreciate it if someone could show me the steps towards solving these problems, not just the solution. Maybe even walking me through one so I can do the rest on my own.

Thanks!

Solution

Edit: short answer, * means "zero or more repetitions" in almost all regex/grammar syntaxes include perl5 and RFC 5234. * typically binds more tightly than concatenation and alternation.

You say you want a language over the alphabet (a, b), but include c and U in your expressions. I'm going to assume that you want a language grammar over the alphabet (a, b, c) in a form like BNF given a regular expression where U is a low-precedence union operator, similar to | in perl re.

In that case,

a|b*

is equivalent to the BNF:

L := a
   | <M>
M := epsilon
   | b <M>

The * operator means zero or more, so the first clause in M is the base case, and the second clause is a recursive use of M that includes a terminal b.

L is just either a single terminal a or the nonterminal M.

(ab*|c)

L ::= a <M>
    | c
M ::= epsilon
    | b <M>

M is derived the same way as above, and the rest is self explanatory.

ab*|bc*

L ::= a <M>
    | b <N>
M ::= epsilon
    | b <M>
N ::= epsilon
    | c <N>

N is derived in the same way as M above.

a*bc*|ac

The * in most regular expression languages binds more tightly than concatenation, so this is the same as

(a*b(c*))|(ac)

which boils down to

L ::= <M> b <N>
    | a c
M ::= epsilon
    | a <M>
N ::= epsilon
    | c <N>

In general to convert a regex to BNF, you simply use adjacency in a regex to mean adjaceny in BNF, and U or | in a regex means | in BNF.

If you define a nonterminal <X> ::= x then you can handle x* thus:

L ::= epsilon
    | <X> <L>

With the same nonterminal <X> ::= x then you can handle x+ thus:

L ::= <X>
    | <L> <X>

That gets you the repetition operators * and + which leaves ?. x? is simply

L ::= epsilon
    | <X>