I'm new to Treetop and attempting to write a CSS/HSS parser. HSS augments the basic functionality of CSS with nested styles, variables and a kind of mixin functionality.
I'm pretty close - the parser can handle CSS - but I fall down when it comes to implementing a style within a style. e.g:
#rule #one {
#two {
color: red;
}
color: blue;
}
I've taken two shots at it, one which handles whitespace and one which doesn't. I can't quite get either to work. The treetop documentation is a little sparse and I really feel like I'm missing something fundamental. Hopefully someone can set me straight.
A:
grammar Stylesheet
rule stylesheet
space* style*
end
rule style
selectors space* '{' space* properties? space* '}' space*
end
rule properties
property space* (';' space* property)* ';'?
end
rule property
property_name space* [:] space* property_value
end
rule property_name
[^:;}]+
end
rule property_value
[^:;}]+
end
rule space
[\t ]
end
rule selectors
selector space* ([,] space* selector)*
end
rule selector
element (space+ ![{] element)*
end
rule element
class / id
end
rule id
[#] [a-zA-Z-]+
end
rule class
[.] [a-zA-Z-]+
end
end
B:
grammar Stylesheet
rule stylesheet
style*
end
rule style
selectors closure
end
rule closure
'{' ( style / property )* '}'
end
rule property
property_name ':' property_value ';'
end
rule property_name
[^:}]+
<PropertyNode>
end
rule property_value
[^;]+
<PropertyNode>
end
rule selectors
selector ( !closure ',' selector )*
<SelectorNode>
end
rule selector
element ( space+ !closure element )*
<SelectorNode>
end
rule element
class / id
end
rule id
('#' [a-zA-Z]+)
end
rule class
('.' [a-zA-Z]+)
end
rule space
[\t ]
end
end
Harness Code:
require 'rubygems'
require 'treetop'
class PropertyNode < Treetop::Runtime::SyntaxNode
def value
"property:(#{text_value})"
end
end
class SelectorNode < Treetop::Runtime::SyntaxNode
def value
"--> #{text_value}"
end
end
Treetop.load('css')
parser = StylesheetParser.new
parser.consume_all_input = false
string = <<EOS
#hello-there .my-friend {
font-family:Verdana;
font-size:12px;
}
.my-friend, #is-cool {
font: 12px Verdana;
#he .likes-jam, #very-much {asaads:there;}
hello: there;
}
EOS
root_node = parser.parse(string)
def print_node(node, output = [])
output << node.value if node.respond_to?(:value)
node.elements.each {|element| print_node(element, output)} if node.elements
output
end
puts print_node(root_node).join("\n") if root_node
#puts parser.methods.sort.join(',')
puts parser.input
puts string[0...parser.failure_index] + '<--'
puts parser.failure_reason
puts parser.terminal_failures
I assume you're running into left recursion problems? If so, keep in mind that TreeTop produces recursive descent parsers, and as such, you can't really use left recursion in your grammar. (One of the main reasons I still prefer ocamlyacc/ocamllex over TreeTop despite its very sexy appearance.) This means you need to convert from left recursive forms to right recursion. Since you undoubtedly own the Dragon Book (right?), I'll direct you to sections 4.3.3, 4.3.4, and 4.4.1 which cover the issue. As is typical, it's hard-to-understand, but parsers didn't get their reputation for nothing. There's also a nice left recursion elimination tutorial that the ANTLR guys put up on the subject. It's somewhat ANTLR/ANTLRworks specific, but it's slightly easier to understand than what's found in the Dragon Book. This is one of those things that just doesn't ever make a whole lot of sense to anyone who hasn't done it at least a few times before.
Also, minor comment, if you're going to use TreeTop, I recommend doing this instead:
def ws
[\t ]*
end
You're not likely to ever need to match a single whitespace character, plus almost every grammar rule is going to need it, so it makes sense to name it something very short. Incidentally, there are advantages to a separate lexing step. This is one of them.