Search code examples
context-free-grammarcontext-free-language

Is this context free language regular?


If I have a language {0,1} defined by the following context-free grammar with start variable S, is it a regular language? S→TS, S→1T, S→1S T→TT, T→0T1, T→1T0 T→ε

Is this language regular?

It seems to me that this language cannot be regular because it is basically any combination of terminals and variables. Whereas a regular language would need to be right or left linear. Am I right, or is my thinking incorrect here? Is there a specific process anyone recommends for determining if a context-free grammar is regular?


Solution

  • The language associated with the grammar is not regular. The rest of the post is concerned with a proof of this statement.

    First, notice that L(T) = {w over {0,1} | w contains equal number of 0s and 1s}.

    You can prove this easily.

    Proof idea. (==>) Suppose w in L(T). Then obviously has the same number of 0s and 1s.

    (<==) Suppose w contains equal number of 0s and 1s. We show by induction that w is derivable from T. If |w|<=2, then it is obviously derivable from T. Suppose, for the inductive hypothesis, that all strings of length k (even length) with equal number of 0s and 1s are derivable from T. Let w be of length k+2. If the first and the last character of w match (both 0 or both 1), apply the production T -> TT; the first and the last parts are both derivable from T by the inductive hypothesis; here, we're using the obvious property that if w[0]=w[|w|-1] then there exists an index i such that w[0..i] and w[i+1..|w|-1] are both in L(T), and both derivable from T by the inductive hypothesis. Else, if the first and the last characters of w don't match, use either T -> 0T1 or T -> 1T0; the resulting string is of length k and is derivable from T by the inductive hypothesis. QED.

    Now, the language of the grammar is the set of strings that can be generated by S. Notice that S generates strings (of terminals and variables) of the form (T+1)*1T. In other words, for any string of terminals w derivable by the grammar it must be the case that S =>* α =>* w, where α is in (T+1)*1T.

    Now it should be evident that the set of all possible terminals that can be generated by S is characterised by the regular expression (L(T)* + 1)*1L(T)*. (You can arrive at this by inspecting the set of strings of terminals that can be generated from (T+1)*1T.)

    You can simplify the expression for L(S): (L(T)+1)*1L(T)=(L(T)+1)* because the language of 1L(T) is a subset of the language of (L(T)+1)^2. Thus L(S) = (L(T)+1)*, the language comprised of binary strings that contain at least as many 1s as 0s.

    This language is not regular. You can prove this using the pumping lemma.

    Proof. Suppose, for the sake of contradiction, that L(S) is regular. Then, by the pumping lemma, there's n>0 such that every w from L(S) of length at least n can be broken into w=xyz such that:

    1. |xy| <= n,
    2. y non-empty,
    3. for all k >= 0, xy^ky in L(S).

    Let w=0^n1^n be a string of length m=2n from L(S). (The lemma talks about all sufficiently long strings from L(S), so we can pick any string we like and still have the properties.) Let w=xyz be a partitioning that exists by the lemma. Obviously, since m=2n and |xy|<=n, y is comprised of 0s only (and at least one 0 since y is not empty).

    Clearly, xy^2z has more zeros than ones and is thus not in L(S). This contradicts the pumping lemma. Thus, L(S) is not regular. QED.