Search code examples
javacompilationantlrantlr4javac

How to get from parse tree to Java class file


I am working on a command-line tool with the following functionality:

  1. Parse modified .java files using an extended ANTLR4 Java9 grammar. The syntax in the files is Java, with one modification to the method declaration which includes a purpose, like in this example: public void {marketing} sendEmail() {}
  2. Collect and remove all purposes using a visitor. Collection and analysis of the purposes is the main functionality of the program.
  3. Compile and execute the Java files where the purposes are removed.

I am searching for the simplest and most effective way to achieve step 3. It is out of the scope of my project to build a full compiler, I would prefer to exploit the Java compiler and run javac if possible. I have considered the following approaches, but none seem optimal:

Any input is much appreciated.


Solution

  • You could use TokenStreamRewriter to get the source code without the purpose node (or accomplish many other rewriting tasks). Here's an example from an application where I conditionally add a top level LIMIT clause to a MySQL query:

    /**
    001     * Parses the query to see if there's already a top-level limit clause. If none was found, the query is
    002     * rewritten to include a limit clause with the given values.
    003     *
    004     * @param query The query to check and modify.
    005     * @param serverVersion The version of MySQL to use for checking.
    006     * @param sqlMode The current SQL mode in the server.
    007     * @param offset The limit offset to add.
    008     * @param count The row count value to add.
    009     *
    010     * @returns The rewritten query if the original query is error free and contained no top-level LIMIT clause.
    011     *          Otherwise the original query is returned.
    012     */
    013    public checkAndApplyLimits(query: string, serverVersion: number, sqlMode: string, offset: number,
    014        count: number): [string, boolean] {
    015
    016        this.applyServerDetails(serverVersion, sqlMode);
    017        const tree = this.startParsing(query, false, MySQLParseUnit.Generic);
    018        if (!tree || this.errors.length > 0) {
    019            return [query, false];
    020        }
    021
    022        const rewriter = new TokenStreamRewriter(this.tokenStream);
    023        const expressions = XPath.findAll(tree, "/query/simpleStatement//queryExpression", this.parser);
    024        let changed = false;
    025        if (expressions.size > 0) {
    026            // There can only be one top-level query expression where we can add a LIMIT clause.
    027            const candidate: ParseTree = expressions.values().next().value;
    028
    029            // Check if the candidate comes from a subquery.
    030            let run: ParseTree | undefined = candidate;
    031            let invalid = false;
    032            while (run) {
    033                if (run instanceof SubqueryContext) {
    034                    invalid = true;
    035                    break;
    036                }
    037
    038                run = run.parent;
    039            }
    040
    041            if (!invalid) {
    042                // Top level query expression here. Check if there's already a LIMIT clause before adding one.
    043                const context = candidate as QueryExpressionContext;
    044                if (!context.limitClause() && context.stop) {
    045                    // OK, ready to add an own limit clause.
    046                    rewriter.insertAfter(context.stop, ` LIMIT ${offset}, ${count}`);
    047                    changed = true;
    048                }
    049            }
    040        }
    051
    052        return [rewriter.getText(), changed];
    053    }
    
    

    What is this code doing:

    • Line 017: the input is parsed to get a parse tree. If you have done that already, you can pass in the parse tree, of course, instead of parsing again.
    • Line 022 prepares a new TokenStreamRewriter instance with your token stream.
    • Line 023 uses ANTLR4's XPATH feature to get all nodes of a specific context type. This is where you can retrieve all your purpose contexts in one go. This would also be a solution for your point 2).
    • The following lines only check if a new LIMIT clause must be added at all. Not so interesting for you.
    • Line 046 is the place where you manipulate the token stream. In this case something is added, but you can also replace or remove nodes.
    • Line 052 contains probably what you are most interested in: it returns the original text of the input, but with all the rewrite actions applied.

    With this code you can create a temporary java file for compilation. And it could be used to execute two actions from your list at the same time (collect the purposes and remove them).