Search code examples
antlrantlrworks

complex AST rewrite rule in ANTLR


After the problem about AST rewrite rule with devide group technique at AST rewrite rule with " * +" in antlr.

I have a trouble with AST generating in ANTLR, again :).Here is my antlr code :

start   :   noun1+=n (prep noun2+=n (COMMA noun3+=n)*)*
        ->  ^(NOUN $noun1) (^(PREP prep) ^(NOUN $noun2) ^(NOUN $noun3)*)*
    ;
n       :    'noun1'|'noun2'|'noun3'|'noun4'|'noun5';
prep    :    'and'|'in';
COMMA   :     ',';

Now, with input : "noun1 and noun2, noun3 in noun4, noun5", i got following unexpected AST:

enter image description here

Compare with the "Parse Tree" in ANLRwork:

enter image description here

I think the $noun3 variable holding the list of all "n" in "COMMA noun3+=n". Consequently, AST parser ^(NOUN $noun3)* will draw all "n" without sperating which "n" actually belongs to the "prep"s.

Are there any way that can make the sepration in "(^(PREP prep) ^(NOUN $noun2) ^(NOUN $noun3))". All I want to do is AST must draw exactly, without token COMMA, with "Parse Tree" in ANTLRwork.

Thanks for help !


Solution

  • Getting the separation that you want is easiest if you break up the start rule. Here's an example (without writing COMMAs to the AST):

    start   :   prepphrase             //one prepphrase is required.
                (COMMA! prepphrase)*   //"COMMA!" means "match a COMMA but don't write it to the AST"
            ;
    
    prepphrase: noun1=n                //You can use "noun1=n" instead of "noun1+=n" when you're only using it to store one value
                (prep noun2=n)? 
                -> ^(NOUN $noun1) ^(PREP prep)? ^(NOUN $noun2)?
            ;
    

    A prepphrase is a noun that may be followed by a preposition with another noun. The start rule looks for comma-separated prepphrases.

    The output appears like the parse tree image, but without the commas.


    If you prefer explicitly writing out ASTs with -> or if you don't like syntax like COMMA!, you can write the start rule like this instead. The two different forms are functionally equivalent.

    start   :   prepphrase             //one prepphrase is required.
                (COMMA prepphrase)*
                -> prepphrase+         //write each prepphrase, which doesn't include commas
            ;