Search code examples
rubyparsingantlrpeg

Is there a parser generator for ruby that can generate a parser with no gem dependencies?


In an effort to make a DSL I have written backwards-compatible with ruby 1.8 I need to do some (relatively straightforward) parsing on the source strings. I could probably do directly with string munging, but in the interest of future maintainability I wanted to investigate first to see what it would take to use a proper parser generator.

The role of this DSL, however, puts an unusual constraint on what ruby gems I can use. The DSL is part of an Xcode project that's distributed with CocoaPods, and CocoaPods is not really about managing ruby dependencies in the build environment.

What this means is, my ruby DSL is effectively restricted to the gems that ship pre-installed on Mac OS X 10.8.

SO, my question: Is there a ruby parser generator out there that generates "stand-alone" ruby code as its final output? Meaning ruby code that does not require anything that's not part of core ruby?

I have looked at the (sparse) documentation for ANTLR for Ruby, but it (understandably) does not address my question. And from my quick glimpse at treetop, it does seem to use a support package bundled as a gem.


Solution

  • After further searching I came across the rexical gem, which is itself a renamed-and-slightly-maintained version of rex. This is an old-school lexer-generator thats only dependency is on racc/parser, which has been part of ruby-core for long enough that I don't have to worry about it.

    The documentation is sparse, but there were enough blog posts touching on the topic that I was able to get what I needed working.

    In case you're curious enough to have read this far, here is my example .rex specification:

    require 'generator'
    
    class OptionSpecsLexer
    rules
      \d+(\.\d*)            { [:number, text] }
      \w+:                  { [:syntax_hash_key, ":#{text[0, text.length - 1]} =>"] }
      \:\w+                 { [:symbol, text] }
      \w+\(                 { [:funcall_open_paren, text] }
      \w+                   { [:identifier, text] }
      \"(\\.|[^\\"])*\"     { [:string, text] }
      =>                    { [:rocket, text] }
      ,                     { [:comma, text] }
      \{                    { [:open_curly, text] }
      \}                    { [:close_curly, text] }
      \(                    { [:open_paren, text] }
      \)                    { [:close_paren, text] }
      \[                    { [:close_square, text] }
      \]                    { [:close_square, text] }
      \\\s+                 { }
      \n                    { [:eol, text] }
      \s+                   { }
    
    inner
    
      def enumerate_tokens
        Generator.new { |token|
          loop {
            t = next_token
            break if t.nil?
            token.yield(t)
          }
        }
      end
    
      def normalize(source)
        scan_setup source
        out = ""
        enumerate_tokens.each do |token|
          out += ' ' + token[1]
        end
        out
      end
    
    end
    

    This lexer understands just enough ruby syntax to preprocess specifications written in my vMATCodeMonkey DSL, replacing the new keyword-style hash key syntax with the old rocket operator syntax. [This was done to allow vMATCodeMonkey to work on un-updated Mac OS X 10.8 which still ships with a deprecated version of ruby.]