Search code examples
javaantlrantlr3antlrworks

ANTLR replace tokens in a recursive manner


I have the following grammar:

rule: q=QualifiedName {System.out.println($q.text);};

QualifiedName
   :   
        i=Identifier { $i.setText($i.text + "_");}
        ('[' (QualifiedName+ | Integer)? ']')*
   ;


Integer
    : Digit Digit*
    ;

fragment
Digit 
    : '0'..'9'
    ;

fragment
Identifier
    :   (   '_'
        |   '$'
        |   ('a'..'z' | 'A'..'Z')
        )
        ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '$')*
    ;

and the code from Java:

ANTLRStringStream stream = new ANTLRStringStream("array1[array2[array3[index]]]");
TestLexer lexer = new TestLexer(stream);
CommonTokenStream tokens = new TokenRewriteStream(lexer);
TestParser parser = new TestParser(tokens);
try {
    parser.rule();
} catch (RecognitionException e) {
    e.printStackTrace();
}

For the input: array1[array2[array3[index]]], i want to modify each identifier. I was expecting to see the output: array1_[array_2[array3_[index_]]], but the output was the same as the input.

So the question is: why the setText() method doesn't work here?

EDIT:

I modified Bart's answer in the following way:

rule: q=qualifiedName {System.out.println($q.modified);};

qualifiedName returns [String modified]
   :   
        Identifier
        ('[' (qualifiedName+ | Integer)? ']')*
        {
            $modified = $text + "_";
        }
   ;

Identifier
    :   (   '_'
        |   '$'
        |   ('a'..'z' | 'A'..'Z')
        )
        ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '$')*
    ;

Integer
    : Digit Digit*
    ;

fragment
Digit 
    : '0'..'9'
    ;

I want to modify each token matched by the rule qualifiedName. I tried the code above, and for the input array1[array2[array3[index]]] i was expecting to see the output array1[array2[array3[index_]_]_]_, but instead only the last token was modified: array1[array2[array3[index]]]_.

How can i solve this?


Solution

  • You can only use setText(...) once a token is created. You're recursively calling this token and setting some other text, which won't work. You'll need to create a parser rule out of QualifiedName instead of a lexer rule, and remove the fragment before Identifier.

    rule: q=qualifiedName {System.out.println($q.text);};
    
    qualifiedName
       :   
            i=Identifier { $i.setText($i.text + "_");}
            ('[' (qualifiedName+ | Integer)? ']')*
       ;
    
    Identifier
        :   (   '_'
            |   '$'
            |   ('a'..'z' | 'A'..'Z')
            )
            ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '$')*
        ;
    
    Integer
        : Digit Digit*
        ;
    
    fragment
    Digit 
        : '0'..'9'
        ;
    

    Now, it will print: array1_[array2_[array3_[index_]]] on the console.

    EDIT

    I have no idea why you'd want to do that, but it seems you're simply trying to rewrite ] into ]_, which can be done in the same way as I showed above:

    qualifiedName
       :   
            Identifier
            ('[' (qualifiedName+ | Integer)? t=']' {$t.setText("]_");} )*
       ;