Regular languages & Proofs (models of computation)

I need help with a couple HW questions I got for my models of computation course. I read the relevant chapters in the text (Michael Sipser's Introduction to the Theory of Computation), but I feel like these HW questions necessitate knowledge the book hasn't given me... The first question is this:

(1) Prove that there does not exist a language L such that L* = {a}* {b}* The hint is to use contradiction.

Obviously the right side is a regular expression; zero or more a's followed by zero or more b's. This could also be the empty string epsilon.

My trouble comes in with the L*. I have absolutely no clue what a * applied to a language does, or how to work with this equation because of the * on the language L.

Any help, or resources for this question would be appreciated.

=====

The second question is easier, and I feel like I can almost do it. I can justify it to myself, so I guess the problem is writing it formally. The second question is this:

(2) Prove that if A and V are languages over the same alphabet (sigma) and A (is a subset of) B, then A* (is a subset of) B*. The hint is to use induction.

Now obviously (for example) if sigma = {1, 0}

and A = {00, 01, 10, 11}

and B = {00, 01, 10, 11, 100, 101, 110, 111}

Then anything in A* is closed under B* because B = A + some other things... If someone could help me formalize this into an inductive form I'd appreciate it.

Thanks

Solution

(1) Here's a useful recursive definition of L*:

epsilon is in L*
if x is in L, x is in L*
if x and y are in L*, so is xy
L* is the smallest language satisfying 1. - 3. for a given L

Given this definition, here's a proof by contradiction: suppose R* = a*b*. Then ab is in R*. By 3., abab must also be in R*. But then R* != a*b*, a contradiction. So R* = a*b* must be false; in other words, for no language R is R* = a*b*.

The intuition is pretty easy: L* is the language of all strings that can be formed by concatenating zero or more (with repeats allowed) strings from L. The recursive definition works by allowing for zero strings (in 1.), strings taking exactly one string from L (in 2.), and strings formed by joining potentially many strings from L (in 3.).

(2) Using the definition I give above, we proceed inductively on the number of concatenated strings in A*. For 0 concatenations, the empty string is in both A* and B*, so we're good (see rule 1.). For 1 concatenation, since A is a subset of B, strings in A will be in A* (see rule 2.), and stringsfrom B will be in B* (same rule), so all strings taking one concatenation in A* are also in B*. Now, assume that all strings taking k concatenations in A* are also in B*. What about strings taking k+1 concatenations in A*? Well, these are formed using rule 3. on strings x and y in A* taking strictly less than k+1 concatenations, i.e., at most k concatenations. In other words, any string in A* taking k+1 concatenations can be rewritten as xy, where x and y are in A*, and x and y take at most k concatenations. Since all strings in A* taking at most k concatenations are also in B* (by our hypothesis), we have that x and y are in B*. By rule 3., xy must also be in B*. Therefore, strings taking k+1 concatenations in A* must also belong to B*. This completes the proof.

Note: this glosses over some details and isn't terribly formal, but you should get the idea and hopefully be able to shape it up. If you need something more polished than this, let me know, and I can try to work with you on it.