Converting a context free grammar into a regular grammar

E -> EAE | (E) | -E | id

A -> + | - | * | /

The terminal set is {id, -,+,*,/} and the starting symbol is E.

I want to convert this grammar to a regular grammar. I tried canceling the left recursion of this grammar and I got:

E -> (E)X | -EX | idX

A -> + | - | * | /

X -> AEX | ε

Is this it or is there something else I need to do?

Solution

You are doomed to failure because your language is not a regular language. There cannot be a correct regular grammar for it. To show this is the case, we can use the Myhill-Nerode theorem which says that a minimal DFA for a regular language has as many states as there are equivalence classes over the indistinguishability relation with respect to the language. If we can show there are infinitely many strings distinguishable with respect to the language of your grammar - which we will do below - then that would imply a minimal DFA for your language would have infinitely many states. DFAs can only have finitely many states, so that effectively means you know your language isn't regular.

To show there are infinitely many distinguishable strings, it suffices to find an infinite sequence of strings w1, w2, ..., wn, ... such that each one is distinguishable from all the others, with respect to the language of your grammar. One way I often approach this is to argue about what the shortest string is that will take each of the strings in our sequence to a string in the language. If you can show that each of the wi has a different shortest string than any of the others, you are done - since two sets cannot be the same if the shortest strings in them have different lengths.

To find a suitable sequence, we can look at your grammar and look for rules that seem non-regular. E -> EAE and E -> (E) both seem problematic. My gut feeling is that because E -> (E) is simpler (it involves fewer nonterminal symbols) it will be easier to work with. Let's define a sequence of words w1, w2, ..., wn, ... that exploits this production to get strings that are distinguishable. If this production is used over and over again, and then E -> Id is used, you get strings of the form (^n id )^n, where the parentheses are matched. This is important - regular languages can't do this. We show this using Myhill-Nerode as follows:

Assume the language is regular. Then by Myhill-Nerode, the number of states in a minimal DFA is the number of equivalence classes over the indistinguishability relation with respect to the language. Consider the infinite sequence of strings (, ((, ..., (^n, ... The shortest strings that can be appended to each of these to get a string in the language are Id), Id)), ..., Id)^n, ..., of lengths 3, 4, ..., n+2, ... This means that each of the strings is distinguishable from every other one, so there are infinitely many equivalence classes. This is a contradiction, so the assumption the language was regular was wrong.