I have this working pair of rules in Treetop that the perfectionist in me believes should be one and only one rule, or maybe something more beautiful at least:
rule _
crap
/
" "*
end
rule crap
" "* "\\x0D\\x0A"* " "*
end
I'm parsing some expressions that every now and then ended up with "\x0D\x0A". Yeah, not "\r\n" but "\x0D\x0A". Something was double escaped at some point. Long story.
That rule works, but it's ugly and it bothers me. I tried this:
rule _
" "* "\\x0D\\x0A"* " "*
/
" "*
end
which caused
SyntaxError: (eval):1276:in `load_from_string': compile error
(eval):1161: class/module name must be CONSTANT
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:42:in `load_from_string'
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:35:in `load'
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `open'
from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `load'
Ideally I would like to actually write something like:
rule _
(" " | "\\x0D\\x0A")*
end
but that doesn't work, and while we are at it, I also discovered that you can't have only one * per rule:
rule _
" "*
/
"\n"*
end
that will match " ", but never \n.
I see you're using three different OR
chars: /
, |
and \
(of which only the first means OR
).
This works fine:
grammar Language
rule crap
(" " / "\\x0D\\x0A")* {
def value
text_value
end
}
end
end
#!/usr/bin/env ruby
require 'rubygems'
require 'treetop'
require 'polyglot'
require 'language'
parser = LanguageParser.new
value = parser.parse(' \\x0D\\x0A \\x0D\\x0A ').value
print '>' + value + '<'
prints:
> \x0D\x0A \x0D\x0A <