Search code examples
antlr4

antlr4 TokenStreamRewriter changing multiple channels


I have a generated ANTLR4 parser and a custom listener I'm using to rewrite the VB6 code, and it works good enough.

All comments are sent to the HIDDEN channel.

I need to rewrite some of the comments as well (let's say, add an "X" to every comment). What is the best way to do this? I don't wan't to send comments to the default channel. I have an idea how to do this using EnterEveryRule, but it seems rather clumpy.

The program is basically this.

string input = "some input";

var stream = new AntlrInputStream(input);
var lexer = new VisualBasic6Lexer(stream);
var tokens = new CommonTokenStream(lexer); 
var parser = new VisualBasic6Parser(tokens);
parser.Interpreter.PredictionMode = PredictionMode.LL_EXACT_AMBIG_DETECTION;

var tree = parser.startRule();

var rwr = new TokenStreamRewriter(tokens);
var lstnr = new MyListener(lexer, parser, rwr);
ParseTreeWalker.Default.Walk(lstnr, tree);

var output = rwr.GetText();

MyListener makes changes in the following way:

public override void EnterModuleConfig([NotNull] VisualBasic6Parser.ModuleConfigContext context)
{
    base.EnterModuleConfig(context);
    Rewriter.InsertBefore(context.Start, "'");
    Rewriter.InsertBefore(context.Stop.TokenIndex - 1, "'");
}

My current approach:

public override void EnterEveryRule([NotNull] ParserRuleContext context)
{
    base.EnterEveryRule(context);

    var tokIx = context.Start.TokenIndex - 1;
    var tokBefore = Rewriter.TokenStream.Get(tokIx);
    if (tokBefore.Type == VisualBasic6Lexer.COMMENT)
    {
        Rewriter.Replace(tokIx, tokBefore.Text + "X");
    }
}

Solution

  • Since all comments are sent to the HIDDEN channel (and this is the only sane thing to do). They are ignored by the parser and will not appear in any contexts.

    You will need to iterate the token stream to modify hidden tokens.

    You can use the getTokens method that allows for passing a set of the token types you want to get just the COMMENT tokens.

    Simple example:

    import java.io.IOException;
    import java.util.HashSet;
    import org.antlr.v4.runtime.CharStreams;
    import org.antlr.v4.runtime.CommonTokenStream;
    import org.antlr.v4.runtime.TokenStreamRewriter;
    
    public class TestVB6 {
        public static void main(String... args) throws IOException {
            var charStream = CharStreams.fromFileName("./examples/helloworld.vb");
            var lexer = new VisualBasic6Lexer(charStream);
            var tokenStream = new CommonTokenStream(lexer);
            var parser = new VisualBasic6Parser(tokenStream);
            var tree = parser.startRule();
    
            var rewriter = new TokenStreamRewriter(parser.getInputStream());
            var comments = new HashSet<Integer>();
            comments.add(VisualBasic6Lexer.COMMENT);
            for (var token : tokenStream.getTokens(0, tokenStream.size() - 1, comments)) {
                rewriter.insertAfter(token, " X");
            }
    
            System.out.println(rewriter.getText());
        }
    }
    

    where the input source is:

    Private Sub cmdHello_Click()
        ' comment for testing
        txtHello.Text = "Hello World!"
        With txtHello
            .Font = "Arial"
            .FontSize = 16
            .ForeColor = vbBlue
        End With
    End Sub
    

    produces the following output:

    Private Sub cmdHello_Click()
        ' comment for testing X
        txtHello.Text = "Hello World!"
        With txtHello
            .Font = "Arial"
            .FontSize = 16
            .ForeColor = vbBlue
        End With
    End Sub