regex concatenation regular-language set-theory set-union

Regular Language Closure Unconcatenation

I'm trying to find an operation that can take a regular language and "unconcatenate" it with another. For example:

a*L - a* = L | where L is a regular language

I know that difference (subtraction) isn't the operation I want. But I believe I'm getting my point across.

Another way to look at it is if there have a set L that is logically equal to (A ∪ B), but we do not have access to A. So if we can only use L, B, and derivations of such, can we somehow derive A. Basically:

L - B = A | L = (A ∪ B)

I have put plenty of thought into this problem, using many variations of compliment, intersection, and other closure properties of regular languages, but I simply can't figure it out.

The best I've managed to come up with is:

A = ((L - B) ∪ (A ∩ B) | L = (A ∪ B)

However this requires A on the right side.

Solution

If L = A U B, define an operator - such that L - B = A.

The problem with this is that the operator - is not well-defined: Given L and B, there are potentially several languages which satisfy L = A U B. In particular, if A is a subset of L and any (possibly improper) superset of L \ B, then A is a solution; that is, if A = (L \ B) U C, where C is a (possibly improper) subset of B, then L - B might as well be equal to that set.

Now, you could define - to mean the set of all such A, and in that case, you could make this workable using set difference, union and power set operators. Then, L - B = Q where Q = {(L \ B) U {}, (L \ B) U {B[0]}, ..., (L \ B) U B = L}.

You can make this well-defined if you specify - always returns the "smallest" element of Q (for finite sets, the one with the fewest elements; for infinite sets, the one which is a subset of all other sets) in which case you recover simply L \ B.

If L = B.A, define an operator - such that L - B = A.

A similar problem exists here: there may be several languages which, when appended to B, give L. For example, consider B = a*, and two choices for A: a* and {e}, the language containing only the empty set. You can show without much effort that a* a* = a* e, so L is the same either way, B is the same, and L - B must now produce two different values: either a* or {e}.