Search code examples
antlrantlr3antlrworks

Antlr AST generating (possible) madness


Is the following even possible? I want to "reverse" the input given to antlr and make each token a child of the previous one.

So, for the input (Assume each token is separated by the '.' char) :

Stack.Overflow.Horse

I would like my grammar to produce the following AST:

Horse
  |---Overflow
         |---Stack

So far, I've managed to reverse the nodes, but I'm unable to make them children of each other:

function
 : ID PERIOD function
   -> function ID
 | ID
 ;

ID  : 'a'..'z'*
    ;

Solution

  • I don't think there's an easy way to do that. You could make your rule like this:

    function
     : ID PERIOD function
       -> ^(function ID)
     | ID
     ;
    

    but that only makes the last node the root and all other nodes its children. For example, the following source:

    a.b.c.d.e
    

    will result in the following tree:

        e
     / / \ \
    d c   b a
    

    I can't see an easy fix since when you first parse a.b.c.d.e, a will be the ID and b.c.d.e the recursive call to function:

    a.b.c.d.e
    | +-----+
    |    |
    |    `-----> function
    |
    `----------> ID
    

    resulting in the fact that b.c.d.e will have a as its child. When then b becomes the ID, it too is added as a child next to a. In your case, a should be removed as a child and then added to the list of b's children. But AFAIK, that is not possible in ANLTR (at least, not in a clean way inside the grammar).


    EDIT

    Okay, as a work-around I had something elegant in mind, but that didn't work as I had hoped. So, as a less elegant solution, you could match the last node as the root in your rewrite rule:

    function
      :  (id '.')* last=id -> ^($last)
      ;
    

    and then collect all possible preceding nodes (children) in a List using the += operator:

    function
      :  (children+=id '.')* last=id -> ^($last)
      ;
    

    and use a custom member-method in the parser to "inject" these children into the root of your tree (going from right to left in your List!):

    function
      :  (children+=id '.')* last=id {reverse($children, (CommonTree)$last.tree);} -> ^($last)
      ;
    

    A little demo:

    grammar ReverseTree;
    
    options {
      output=AST;
    }
    
    tokens {
      ROOT;
    }
    
    @members {
      private void reverse(List nodes, CommonTree root) {
        if(nodes == null) return;
        for(int i = nodes.size()-1; i >= 0; i--) {
          CommonTree temp = (CommonTree)nodes.get(i);
          root.addChild(temp);
          root = temp;
        }
      }
    }
    
    parse
      :  function+ EOF -> ^(ROOT function+)
      ;
    
    function
      :  (children+=id '.')* last=id {reverse($children, (CommonTree)$last.tree);} -> ^($last)
      ;
    
    id 
      :  ID
      ;
    
    ID
      :  ('a'..'z' | 'A'..'Z')+
      ;
    
    Space
      :  ' ' {skip();}
      ;
    

    And a little test class:

    import org.antlr.runtime.*;
    import org.antlr.runtime.tree.*;
    import org.antlr.stringtemplate.*;
    
    public class Main {
        public static void main(String[] args) throws Exception {
            ANTLRStringStream in = new ANTLRStringStream("a.b.c.d.e    Stack.Overflow.Horse    singleNode");
            ReverseTreeLexer lexer = new ReverseTreeLexer(in);
            CommonTokenStream tokens = new CommonTokenStream(lexer);
            ReverseTreeParser parser = new ReverseTreeParser(tokens);
            ReverseTreeParser.parse_return returnValue = parser.parse();
            CommonTree tree = (CommonTree)returnValue.getTree();
            DOTTreeGenerator gen = new DOTTreeGenerator();
            StringTemplate st = gen.toDOT(tree);
            System.out.println(st);
        }
    }
    

    which will produce an AST that looks like:

    alt text

    For the input string:

    "a.b.c.d.e    Stack.Overflow.Horse    singleNode"