Search code examples
ruby-on-railsrubyparsingtreetop

Getting date parts from a simple treetop parser: wrong argument type Class (expected Module)


For the following treetop grammer, when parsing '3/14/01' (via t = Parser.parse('3/14/01') in irb), I get a "TypeError: wrong argument type Class (expected Module)".

grammar SimpleDate

  rule dateMDY
      whitespace? month_part ( '/' / '-') day_part ( ('/' / '-') year_part)? whitespace?  <DateMDY>
  end

  rule month_part
    ( ( '1' [0-2] ) / ( '0'? [1-9] ) )  <MonthLiteral>
  end

  rule day_part
    ( ( [12] [0-9] ) / ( '3' [0-1] ) / ( '0'? [1-9] ) ) <DayLiteral>
  end

  rule year_part
    ( ( '1' '9' ) / ( '2' [01] ) )? [0-9] [0-9]   <YearLiteral>   # 1900 through 2199 (4 digit)
  end

  rule whitespace
    [\s]+
  end

end

First, if I comment out the <MonthLiteral> and the <DayLiteral> class references, all is well. Commenting out <DateMDY>, but leaving those Literal objects in, will also issue the error. Commenting out <YearLiteral> does not seem to matter (it'll work or not work regardless) -- that seems to indicate that because the first two are non-terminal, I can't produce elements for them.

There is clearly something I'm not appreciating about how Ruby (or treetop) is instantiating these classes or about AST generation that would explain what happens. Can you explain or point me to something that can help me understand why <MonthLiteral> or <DayLiteral> can't have objects generated?

Second, this may be a bridge too far, but what I'd really prefer would be to get a DateMDY object with three attributes -- month, day, and year -- so I can readily produce a Ruby Time object from a method to_time in DateMDY, but right now I'd settle for just producing the constituent pieces as objects.

So I tried leaving <DateMDY> as the object and commented out the references to <MonthLiteral>, <DayLiteral>, and <YearLiteral>. I saw that the resulting AST object returned from .parse (t in my original example) has two public methods -- :day_part and :month_part but those seem to be nil when I invoke those (say, puts t.day_part) and there is no :year_part method, so that didn't seem to help me out.

Is it possible to do somehow have DateMDY end up accessing its constituent parts?

FYI, the Parser code itself I'm using is pretty standard stuff from the treetop tutorials and the node_extensions.rb that defines the object classes is also trivial, but I can post those too if you need to see those.

Thanks! Richard


Solution

  • The error message is telling you exactly what you did wrong. There's only a restricted set of places where you can use a Class this way. When it's allowed, the Class must be a subclass of SyntaxNode. Normally however you should use a Module, which is extend()ed into the SyntaxNode that has been created by an inner rule. The difference in the case of YearLiteral is it does not wrap a parenthesised sequence the way Month and Day literal do. This parenthesised sequence returns an existing SyntaxNode, which cannot be extend()ed with another Class, only with a Module, so you get the TypeError.

    As for your second question, the DateMDY object you want should almost certainly not be a SyntaxNode - since all SyntaxNodes retain references to all their child SyntaxNodes and to the input string - this is the parser internals we're talking about. Do you really want to expose bits of the parser internals to the outside world?

    Instead, you should arrange for the appropriate syntax node to be visited after the parse has completed, by calling a function that returns your domain object type constructed using the substrings identified and saved by these parser objects. It's best to add these functions to traverse down from your topmost rule, rather than trying to traverse the parse tree "from the outside".

    You can do this by adding a block into your top rule, like this (assuming you have an appropriate DateMDY class). When you have a successful parse tree, get your DateMDY by calling "tree.result":

    rule dateMDY
      whitespace? month_part ( '/' / '-') day_part y:( ('/' / '-') year_part)? whitespace?
      {
        def result
          DateMDY.new(y.empty? ? nil : y.year_part.text_value.to_i,
            month_part.text_value.to_i,
            day_part.text_value.to_i)
        end
      }
    end
    

    Of course, it's cleaner to add separate result methods for year_part, month_part and day_part; this is just an intro to how to add these methods.