Search code examples
regexgrammarregular-languageformal-languages

Converting regex to a regular grammar/right-linear grammar


I would like to verify that I am converting this regex to a right-linear grammar correctly based on the information from this previous question and the wonderful answer by Grijesh: Left-Linear and Right-Linear Grammars

Here is the question: "Write a regular (right linear) grammar that generates the set of strings denoted by the regular expression ((10)+ (011 + 1)+)* (0 + 101)*."

And here are the grammars I built up, with my final one being on the bottom:

1:
S --> 1

0:
S --> 0

10:
S --> 1A
A --> 0

(10)+:
S --> 1A
A --> 0 | 0S

011:
S --> 0A
A --> 1B
B --> 1

(011 + 1):
S --> 0A | 1
A --> 1B
B --> 1

(011 + 1)+:
S --> 0A | 1 | 1S
A --> 1B
B --> 1 | 1S

(10)+ (011 + 1)+:
S --> 1A
A --> 0S | 0B
B --> 0C | 1 | 1B
C --> 1D
D --> 1 | 1B

((10)+ (011 + 1)+)*:
S --> 1A | ε
A --> 0S | 0B
B --> 0C | 1 | 1B
C --> 1D
D --> 1 | 1B

0:
S --> 0

101:
S --> 1A
A --> 0B
B --> 1

(0 + 101):
S --> 0 | 1A
A --> 0B
B --> 1

(0 + 101)*:
S --> 0 | 1A | ε
A --> 0B
B --> 1

((10)+ (011 + 1)+)* (0 + 101)*:
S --> 1A | ε | E
A --> 0S | 0B
B --> 0C | 1 | 1B
C --> 1D
D --> 1 | 1B | 1E
E --> 0 | 1F | ε
F --> 0G | 0E
G --> 1 | 1E

Could anyone help me verify whether this is correct? Thanks bunches! :D


Solution

  • Everything looks right up until here:

    ((10)+ (011 + 1)+)*:
    S --> 1A | ε
    A --> 0S | 0B
    B --> 0C | 1 | 1B
    C --> 1D
    D --> 1 | 1B
    

    Your grammar for the inner expression is correct:

    (10)+ (011 + 1)+:
    S --> 1A
    A --> 0S | 0B
    B --> 0C | 1 | 1B
    C --> 1D
    D --> 1 | 1B
    

    Note that the only difference is you allowed the empty string to be produced. The Kleene closure adds more than just the empty string: it allows the whole pattern to repeat. Possibly, this could be fixed by adding productions B --> 1S and D --> 1S to the first grammar, thus allowing for an arbitrary amount of repetition.

    The same error occurs in this pair:

    (0 + 101):
    S --> 0 | 1A
    A --> 0B
    B --> 1
    
    (0 + 101)*:
    S --> 0 | 1A | ε
    A --> 0B
    B --> 1
    

    The second grammar should add productions S --> 0S and B --> 1S to allow for an arbitrary amount of repetition.

    The rest of the construction, looks right, and along with the fixes mentioned above should give a correct grammar.

    Note: you can do this algorithmically by:

    1. Using an algorithm to produce an NFA from the regular expression
    2. Using an algorithm to produce a DFA from the NFA
    3. (Optional) Using an algorithm to minimize the DFA
    4. Using an algorithm to produce a regular grammar from a DFA

    The long pole in the computational tent is step 2; this step isn't even really necessary if you are able to, instead of fully determinizing the automaton, to eliminate epsilon/lambda transitions. The reason why that would be sufficient is that the process of turning a DFA into a regular grammar makes states into nonterminal symbols and maps the transition f(q, s) = q' to the production q := sq', with q := s also if q' is accepting. As long as an NFA doesn't have empty transitions you can go ahead and use that.