regex grammar regular-language formal-languages

Converting regex to a regular grammar/right-linear grammar

I would like to verify that I am converting this regex to a right-linear grammar correctly based on the information from this previous question and the wonderful answer by Grijesh: Left-Linear and Right-Linear Grammars

Here is the question: "Write a regular (right linear) grammar that generates the set of strings denoted by the regular expression ((10)+ (011 + 1)+)* (0 + 101)*."

And here are the grammars I built up, with my final one being on the bottom:

1:
S --> 1

0:
S --> 0

10:
S --> 1A
A --> 0

(10)+:
S --> 1A
A --> 0 | 0S

011:
S --> 0A
A --> 1B
B --> 1

(011 + 1):
S --> 0A | 1
A --> 1B
B --> 1

(011 + 1)+:
S --> 0A | 1 | 1S
A --> 1B
B --> 1 | 1S

(10)+ (011 + 1)+:
S --> 1A
A --> 0S | 0B
B --> 0C | 1 | 1B
C --> 1D
D --> 1 | 1B

((10)+ (011 + 1)+)*:
S --> 1A | ε
A --> 0S | 0B
B --> 0C | 1 | 1B
C --> 1D
D --> 1 | 1B

0:
S --> 0

101:
S --> 1A
A --> 0B
B --> 1

(0 + 101):
S --> 0 | 1A
A --> 0B
B --> 1

(0 + 101)*:
S --> 0 | 1A | ε
A --> 0B
B --> 1

((10)+ (011 + 1)+)* (0 + 101)*:
S --> 1A | ε | E
A --> 0S | 0B
B --> 0C | 1 | 1B
C --> 1D
D --> 1 | 1B | 1E
E --> 0 | 1F | ε
F --> 0G | 0E
G --> 1 | 1E

Could anyone help me verify whether this is correct? Thanks bunches! :D

Solution

Everything looks right up until here:

((10)+ (011 + 1)+)*:
S --> 1A | ε
A --> 0S | 0B
B --> 0C | 1 | 1B
C --> 1D
D --> 1 | 1B

Your grammar for the inner expression is correct:

(10)+ (011 + 1)+:
S --> 1A
A --> 0S | 0B
B --> 0C | 1 | 1B
C --> 1D
D --> 1 | 1B

Note that the only difference is you allowed the empty string to be produced. The Kleene closure adds more than just the empty string: it allows the whole pattern to repeat. Possibly, this could be fixed by adding productions B --> 1S and D --> 1S to the first grammar, thus allowing for an arbitrary amount of repetition.

The same error occurs in this pair:

(0 + 101):
S --> 0 | 1A
A --> 0B
B --> 1

(0 + 101)*:
S --> 0 | 1A | ε
A --> 0B
B --> 1

The second grammar should add productions S --> 0S and B --> 1S to allow for an arbitrary amount of repetition.

The rest of the construction, looks right, and along with the fixes mentioned above should give a correct grammar.

Note: you can do this algorithmically by:

Using an algorithm to produce an NFA from the regular expression
Using an algorithm to produce a DFA from the NFA
(Optional) Using an algorithm to minimize the DFA
Using an algorithm to produce a regular grammar from a DFA

The long pole in the computational tent is step 2; this step isn't even really necessary if you are able to, instead of fully determinizing the automaton, to eliminate epsilon/lambda transitions. The reason why that would be sufficient is that the process of turning a DFA into a regular grammar makes states into nonterminal symbols and maps the transition f(q, s) = q' to the production q := sq', with q := s also if q' is accepting. As long as an NFA doesn't have empty transitions you can go ahead and use that.