Search code examples
rubytreetop

How do you handle no nonterminal node in a 0 or more statement when using elements.map in Ruby Treetop?


I am trying to create a custom syntax node class that maps all its nonterminal nodes. The problem is that one of the nodes does not necessary have to be there which creates a problem when using the elements.map in the custom syntax node class, as the syntax node tree creates the SyntaxNode: "" for it instead, which I have not created a class for.

grammar Foo
  rule textStructure
    beginDoc twoOrMoreNewLines (block twoOrMoreNewLines)* endDoc <Structure>
  end

  rule beginDoc
    'begin document' <BeginLine>
  end

  rule twoOrMoreNewLines
    "\n" 2.. <NewLine>
  end

  rule block
    !beginDoc information:((!"\n" .)+) <BlockLine>
  end

  rule endDoc
    'end document' <EndLine>
  end
end

# On a different file


module Foo
  class Structure < Treetop::Runtime::SyntaxNode
    def to_array
      return self.elements.map {|x| x.to_array}
    end
  end

  class BeginLine < Treetop::Runtime::SyntaxNode
    def to_array
      return self.text_value
    end
  end

  class NewLine < Treetop::Runtime::SyntaxNode
    def to_array
      return self.text_value
    end
  end

  class BlockLine < Treetop::Runtime::SyntaxNode
    def to_array
      return self.information.text_value
    end
  end

  class EndLine < Treetop::Runtime::SyntaxNode
    def to_array
      return self.text_value
    end
  end
end

For example if I try to parse: "begin document\n\nend document". Then I would expect this as an output: ["begin document", "\n\n", "end document"], but instead I get the error message: block in to_array': undefined methodto_array' for SyntaxNode offset=16, "":Treetop::Runtime::SyntaxNode (NoMethodError).

So I did some further investigation and discovered that the syntax node tree does indeed contain a SyntaxNode "" at offset=16, which I believe is due to (block twoOrMoreNewLines)* not being there.

How do I handle this problem? Is there a way to avoid SyntaxNode "" from being created?


Solution

  • The SyntaxNode at offset 16 contains an empty array of children, for the iterated sub-rule. It is needed for the packrat parsing algorithm to work. You should not just call to_array on an arbitrary SyntaxNode, but should handle it specially. The best way is to label it and ask if the labelled is empty before iterating over its elements:

    rule textStructure
      beginDoc twoOrMoreNewLines nested:(block twoOrMoreNewLines)* endDoc <Structure>
    end
    

    ...

    class Structure < Treetop::Runtime::SyntaxNode
      def to_array
        return nested.empty? ? [] : nested.elements.map {|x| x.to_array}
      end
    end
    

    or something like that.