Search code examples
rubytreetop

Turning a Treetop parse tree into an abstract syntax tree (AST)


I have simplified a grammar expressed in Treetop, and I am trying to filter the parser's output into an AST, using custom nodes.

grammar Elem

 rule top
   lpar 'top' space 
      args_:(lpar 'args' space ((ident / number) space?)* rpar)  space? 
   rpar <Top>
 end

 rule ident
   [a-zA-Z] [a-zA-Z0-9_]* <Ident>
 end

 rule number
   [0-9]+ <Number>
 end

 rule space
  [\s]+
 end

 rule lpar
  space? '(' space? 
 end

 rule rpar
  space? ')' space? 
 end
end

Basically, it can parse the following example :

(top (args foo bar 42))

The custom nodes all inherit Treetop::Runtime::SyntaxNode

Now, I need to filter the parse tree generated by Treetop into an AST.

I follow the strategy explained here , without success : my AST is just empty...

My compiler driver is the following :

require 'treetop'
require 'pp'

require_relative 'elem'
require_relative 'node_extension'

class ElemCompiler
  def initialize
    @parser=ElemParser.new
  end

  def compile filename
    puts "==> compiling #{filename}"
    @ast=parse(filename)
    puts "==> AST in memory. Good."
  end

  def parse filename
    pp tree=@parser.parse(IO.read(filename))
    pp clean(tree)
  end

  private

  def clean(root_node)
    return if(root_node.elements.nil?)
    pp root_node.elements.collect {|node| node.class.name =="Treetop::Runtime::SyntaxNode" }
    pp root_node.elements.delete_if{|node| node.class.name == "Treetop::Runtime::SyntaxNode" }
    root_node.elements.each {|node| clean(node) }
  end


end

 compiler=ElemCompiler.new.parse ARGV[0]

What am I missing ?


Solution

  • Your code indeed parses correctly the supplied expression.

    There is a small error, though, in the clean method:

    def clean(root_node)
        return if(root_node.elements.nil?)
        pp root_node.elements.collect {|node| node.class.name =="Treetop::Runtime::SyntaxNode" }
        pp root_node.elements.delete_if{|node| node.class.name == "Treetop::Runtime::SyntaxNode" }
        root_node.elements.each {|node| clean(node) }
    end
    

    The clean method is returning the last expression evaluated, which is the each method of the array elements. What you want to return, really, is the root node, so the line pp clean(tree) would actually print the resulting, clean tree, not the result of the each expression.

    You can solve in two ways, one is adding root_node as return expression:

    def clean(root_node)
        (...)
        pp root_node.elements.delete_if{|node| node.class.name == "Treetop::Runtime::SyntaxNode" }
        root_node  # here
    end
    

    Or you can change the parse method to the following:

    def parse filename
        pp tree = @parser.parse(IO.read(filename))
        clean(tree) # we clean the tree
        pp tree     # since tree is an object, side-effects will persist here
    end
    

    However, I would not recommend cleaning the tree. I have had some really bad experience on doing this. It's true that you get a cleaner structure which you can understand much better, since Treetop often keeps a lot of information you actually don't need, but you risk losing, for example, the possibility of referencing parsed expressions with its identifiers (custom labels or Automatically-Defined Element Accessor Methods for non-terminal symbols) (this is a webarchive link).

    Also, in some cases, cleaning a node just because it's class name is "Treetop::Runtime::SyntaxNode" is simply not correct, because in some cases you have to use a module, not a class, to extend your nodes, and, in that case, the node class name will still be "Treetop::Runtime::SyntaxNode", but the node will be cleaned from the tree, and you'll lose the mixed-in module functionality.

    Let me know if I was clear (unfortunately the documentation site seems to be down, it had quite a lot of useful examples I would like to show you, and since it's been a while I don't play with grammars I don't really remember it).