Search code examples
rubytreetop

I believe this should be one rule in Treetop


I have this working pair of rules in Treetop that the perfectionist in me believes should be one and only one rule, or maybe something more beautiful at least:

rule _
  crap
  /
  " "*
end

rule crap
  " "* "\\x0D\\x0A"* " "*
end

I'm parsing some expressions that every now and then ended up with "\x0D\x0A". Yeah, not "\r\n" but "\x0D\x0A". Something was double escaped at some point. Long story.

That rule works, but it's ugly and it bothers me. I tried this:

rule _
  " "* "\\x0D\\x0A"* " "*
  /
  " "*
end

which caused

SyntaxError: (eval):1276:in `load_from_string': compile error
(eval):1161: class/module name must be CONSTANT
    from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:42:in `load_from_string'
    from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:35:in `load'
    from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `open'
    from /.../gems/treetop-1.4.9/lib/treetop/compiler/grammar_compiler.rb:32:in `load'

Ideally I would like to actually write something like:

rule _
  (" " | "\\x0D\\x0A")*
end

but that doesn't work, and while we are at it, I also discovered that you can't have only one * per rule:

rule _
  " "*
  /
  "\n"*
end

that will match " ", but never \n.


Solution

  • I see you're using three different OR chars: /, | and \ (of which only the first means OR).

    This works fine:

    grammar Language
    
      rule crap
        (" " / "\\x0D\\x0A")* {
          def value
            text_value    
          end
        }
      end
    
    end
    
    #!/usr/bin/env ruby
    
    require 'rubygems'
    require 'treetop'
    require 'polyglot'
    require 'language'
    
    parser = LanguageParser.new
    value = parser.parse(' \\x0D\\x0A   \\x0D\\x0A   ').value
    print '>' + value + '<'
    

    prints:

    > \x0D\x0A   \x0D\x0A   <