I have simplified a grammar expressed in Treetop, and I am trying to filter the parser's output into an AST, using custom nodes.
grammar Elem
rule top
lpar 'top' space
args_:(lpar 'args' space ((ident / number) space?)* rpar) space?
rpar <Top>
end
rule ident
[a-zA-Z] [a-zA-Z0-9_]* <Ident>
end
rule number
[0-9]+ <Number>
end
rule space
[\s]+
end
rule lpar
space? '(' space?
end
rule rpar
space? ')' space?
end
end
Basically, it can parse the following example :
(top (args foo bar 42))
The custom nodes all inherit Treetop::Runtime::SyntaxNode
Now, I need to filter the parse tree generated by Treetop into an AST.
I follow the strategy explained here , without success : my AST is just empty...
My compiler driver is the following :
require 'treetop'
require 'pp'
require_relative 'elem'
require_relative 'node_extension'
class ElemCompiler
def initialize
@parser=ElemParser.new
end
def compile filename
puts "==> compiling #{filename}"
@ast=parse(filename)
puts "==> AST in memory. Good."
end
def parse filename
pp tree=@parser.parse(IO.read(filename))
pp clean(tree)
end
private
def clean(root_node)
return if(root_node.elements.nil?)
pp root_node.elements.collect {|node| node.class.name =="Treetop::Runtime::SyntaxNode" }
pp root_node.elements.delete_if{|node| node.class.name == "Treetop::Runtime::SyntaxNode" }
root_node.elements.each {|node| clean(node) }
end
end
compiler=ElemCompiler.new.parse ARGV[0]
What am I missing ?
Your code indeed parses correctly the supplied expression.
There is a small error, though, in the clean method:
def clean(root_node)
return if(root_node.elements.nil?)
pp root_node.elements.collect {|node| node.class.name =="Treetop::Runtime::SyntaxNode" }
pp root_node.elements.delete_if{|node| node.class.name == "Treetop::Runtime::SyntaxNode" }
root_node.elements.each {|node| clean(node) }
end
The clean method is returning the last expression evaluated, which is the each method of the array elements. What you want to return, really, is the root node, so the line pp clean(tree)
would actually print the resulting, clean tree, not the result of the each expression.
You can solve in two ways, one is adding root_node as return expression:
def clean(root_node)
(...)
pp root_node.elements.delete_if{|node| node.class.name == "Treetop::Runtime::SyntaxNode" }
root_node # here
end
Or you can change the parse method to the following:
def parse filename
pp tree = @parser.parse(IO.read(filename))
clean(tree) # we clean the tree
pp tree # since tree is an object, side-effects will persist here
end
However, I would not recommend cleaning the tree. I have had some really bad experience on doing this. It's true that you get a cleaner structure which you can understand much better, since Treetop often keeps a lot of information you actually don't need, but you risk losing, for example, the possibility of referencing parsed expressions with its identifiers (custom labels or Automatically-Defined Element Accessor Methods for non-terminal symbols) (this is a webarchive link).
Also, in some cases, cleaning a node just because it's class name is "Treetop::Runtime::SyntaxNode" is simply not correct, because in some cases you have to use a module, not a class, to extend your nodes, and, in that case, the node class name will still be "Treetop::Runtime::SyntaxNode", but the node will be cleaned from the tree, and you'll lose the mixed-in module functionality.
Let me know if I was clear (unfortunately the documentation site seems to be down, it had quite a lot of useful examples I would like to show you, and since it's been a while I don't play with grammars I don't really remember it).