Search code examples
javagrammarabstract-syntax-treeantlr3parse-tree

Antlr3: building parse tree for qualified names


I couldn't find a question/answer that comes close to helping with my issue. Therefore, I am posting this question here.

I am trying to build a parse tree for qualified names. The below example shows an example.

E.g.,

  1. foo_boo.aaa.ccc1_c enter image description here

Here I have dot separated words. I am using antlr3 and below is my grammer.

parse
    :  expr
    ;


list_expr : <I removed the grammar here>
SimpleType : ('a'..'z'|'A'..'Z'|'_')('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
           ;

QualifiedType : SimpleType | SimpleType ('\.' SimpleType)+;


expr : list_expr
    | QualifiedType
    | union_expr;

/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/

WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+    { $channel = HIDDEN; } ;

Here, SympleType represents grammar for a word. My requirement is to build the grammar for the QualifiedType. The current grammar given in above is not working as expected (QualifiedType : SimpleType | SimpleType ('\.'SimpleType)+;). How to write correct grammar for Qualified names (Dot separated words)?


Solution

  • Make QualifiedType a parser rule instead of a lexer rule:

    qualifiedType : SimpleType ('.' SimpleType)*;
    

    Also, '\.' does not need an escape: '.' is OK.

    EDIT

    You'll have to set the output to AST and apply some tree rewrite rules to make it work properly. Here's a quick demo:

    grammar T;
    
    options {
      output=AST;
    }
    
    tokens {
      Root;
      QualifiedName;
    }
    
    parse
     : qualifiedType EOF -> ^(Root qualifiedType)
     ;
    
    qualifiedType
     : SimpleType ('.' SimpleType)* -> ^(QualifiedName SimpleType+)
     ;
    
    SimpleType
     : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_')*
     ;
    

    And if you now run the code:

    import org.antlr.runtime.*;
    import org.antlr.runtime.tree.CommonTree;
    import org.antlr.runtime.tree.DOTTreeGenerator;
    import org.antlr.stringtemplate.StringTemplate;
    
    public class Main {
        public static void main(String[] args) throws Exception {
            TLexer lexer = new TLexer(new ANTLRStringStream("foo_boo.aaa.ccc1_c"));
            TParser parser = new TParser(new CommonTokenStream(lexer));
            CommonTree tree = (CommonTree)parser.parse().getTree();
            DOTTreeGenerator gen = new DOTTreeGenerator();
            StringTemplate st = gen.toDOT(tree);
            System.out.println(st);
        }
    }
    

    you'll get some DOT output, which corresponds to the following AST:

    enter image description here