parsing grammar compiler-theory lr-grammar

Finding an equivalent LR grammar for the same number of "a" and "b" grammar?

I can't seem to find an equivalent LR grammar for:

S → aSbS | bSaS | ε

which I think recognize strings with the same number of 'a' than 'b'.

What would be a workaround for this? Is it possible to find and LR grammar for this?

Thanks in advance!

EDIT:

I have found what I think is an equivalent grammar but I haven't been able to prove it.

I think I need to prove that the original grammar generates the language above, and then prove that language is generated for the following equivalent grammar. But I am not sure how to do it. How should I do it?

S → aBS | bAS | ε

B → b | aBB

A → a | bAA

Thanks in advance...

PS: I have already proven that this new grammar is LL(1), SLR(1), LR(1) and LALR(1).

Solution

Unless a grammar is directly related to another grammar -- for example through standard transformations such as normalization, null-production eliminate, and so on -- proving that two grammars derivee the same language is very difficult without knowing what the language is. It is usually easier to prove (independently) that each grammar derives the language.

The first grammar you provide:

S → aSbS | bSaS | ε

does in fact derive the language of all strings over the alphabet {a, b}^* where the number of as is the same as the number of bs. We can prove that in two parts: first, that every sentence recognized by the grammar has that property, and second that every sentence which has that property can be derived by that grammar. Both proofs proceed by induction.

For the forward proof, we proceed by induction on the number of derivations. Suppose we have some derivation S → α → β → … → ω where all the greek letters represent sequences of non-terminals and terminals.

If the length of the derivation is exactly zero, so that it starts and ends with S, then there are no terminals in any derived sentence so its clear that every derived sentence has the same number of as and bs. (Base step)

Now for the induction step. Suppose that every derivation of length i is known to end with a derived sentence which has the same number of as and bs. We want to prove from that premise that every derivation of length i+1 ends with a sentence which has the same number of as and bs. But that is also clear: each of the three possible production steps preserves parity.

Now, let's look at the opposite direction: every sentence with the same number of as and bs can be derived from that grammar. We'll do this by induction on the length of the string. Our induction premise will be that if it is the case that for every j ≤ i, every sentence with exactly j as and j bs has a derivation from S, then every sentence with exactly i+1 as and i+1 bs. (Here we are only considering sentences consisting only of terminals.)

Consider such a ssentence. It either starts with an a or a b. Suppose that it starts with an a: then there is at least one b in the sentence such that the prefix ending with that b has the same number of each terminal. (Think of the string as a walk along a square grid: every a moves diagonally up and right one unit, and every b moves diagonally down and right. Since the endpoint is at exactly the same height as the beginning point and there are no wormholes in the graph, once we ascend we must sooner or later descend back to the starting height, which is a prefix ending b.) So the interior of that prefix (everything except the a at the beginning and the b at the end) is balanced, as is the remainder of the string. Both of those are shorter, so by the induction hypothesis they can be derived from S. Making those substitutions, we get aSbS, which can be derived from S. An identical argument applies to strings starting with b. Again, the base step is trivial.

So that's basically the proof procedure you'll need to adapt for your grammar.

Good luck.

By the way, this sort of question can also be posed on cs.stackexchange.com or math.stackexchange.com, where the MathJax is available. MathJax makes writing out mathematical proofs much less tedious, so you may well find that you'll get more readable answers there.