Search code examples
rubytreetop

How to Eliminate Left Recursion in A Verilog Grammar Example


I am using Treetop to create a grammar for the Verilog language, and have come across some cases where the language specification involves a left recursive construct which does not translate to Treetop.

I have done some reading on this, and this answer gives a good summary of a generic way to eliminate left recursion: Left recursion elimination

However, I cannot wrap my head around how that actually works and would appreciate if someone more knowledgeable could confirm whether my approach here is correct...

For this original rule, which includes left recursion (the comment is how it is written in the language spec):

  # constant_expression ::=
  #   constant_primary
  #   | unary_operator { attribute_instance } constant_primary
  #   | constant_expression binary_operator { attribute_instance } constant_expression
  #   | constant_expression ? { attribute_instance } constant_expression : constant_expression
  rule constant_expression
    constant_primary /
    (unary_operator (s attribute_instance)* s constant_primary) /
    (constant_expression s binary_operator (s attribute_instance)* s constant_expression) /
    (constant_expression s "?" (s attribute_instance)* s constant_expression s ":" s constant_expression)
  end 

Is the following really equivalent with the left recursion removed?

  rule constant_expression
    (constant_primary constant_expression_tail?) /
    (unary_operator (s attribute_instance)* s constant_primary constant_expression_tail?)
  end

  rule constant_expression_tail
    (s binary_operator (s attribute_instance)* s constant_expression constant_expression_tail?) /
    (s "?" (s attribute_instance)* s constant_expression s ":" s constant_expression constant_expression_tail?)
  end 

Solution

  • That seems to make sense and do the same thing. One thing that might help in understanding is to rewrite like the code below. One thing to remember about PEG grammars is that if a rule fails to match, it will try to match the next alternation.

    rule constant_expression
       (constant_primary / (unary_operator (s attribute_instance)* s constant_primary)) constant_expression_tail?
    end
    
    rule constant_expression_tail
       ((s binary_operator (s attribute_instance)* s) /
        (s "?" (s attribute_instance)* s constant_expression s ":" s)) constant_expression constant_expression_tail?
    end