Search code examples
javaabstract-syntax-treejython

How to get AST from Python code, change it, and write it back to file, in Java?


Question

How can one read an arbitrary Python file, build an abstract syntax tree from it, modify that, and then write the modified AST back to file, in Java? (Small note, for a concrete syntax tree (which includes spacing comments etc), one could call this pip package from Java.)

Approach

I tried the following method to first read the Python code to generate the abstract syntax tree (AST):

package com.doctestbot.cli;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import org.python.core.Py;
import org.python.core.PyObject;
import org.python.core.PyString;
import org.python.util.PythonInterpreter;

/**
 * A class to retrieve the Python abstract syntax tree using Jython. This is a utility class,
 * meaning one only calls its method, and one does not instantiate the object.
 */
public final class PythonAstRetriever {
  /**
   * Retrieves the Python abstract syntax tree for the given Python code.
   *
   * @param pythonCode The Python code for which to retrieve the AST.
   * @return The Python abstract syntax tree as a PyObject.
   */
  @SuppressWarnings({"PMD.LawOfDemeter"})
  public static PyObject getPythonAst(String pythonCode) {
    // Create a PythonInterpreter
    PythonInterpreter interpreter = new PythonInterpreter();

    // Access the "ast" module from Python
    PyObject astModule = interpreter.get("ast");

    // Parse the Python code and generate the AST
    PyObject invokeArg = new PyString(pythonCode);

    return astModule.invoke("parse", invokeArg, Py.None, Py.None);
  }

  /**
   * Reads the content of a Python code file from the specified file path.
   *
   * @param filePath The path to the Python code file to read.
   * @return The content of the Python code file as a string.
   * @throws IOException If an I/O error occurs while reading the file.
   */
  public static String readPythonCodeFromFile(String filePath) throws IOException {
    Path path = Paths.get(filePath);
    return Files.readString(path);
  }

  // Private constructor to prevent instantiation of the utility class.
  private PythonAstRetriever() {
    throw new AssertionError("PythonAstRetriever class should not be instantiated.");
  }
}

However, when I run it with:

String pythonCode = 
            "\"\"\"Example python file with a function.\"\"\"\n" +
            "\n" +
            "from typeguard import typechecked\n" +
            "\n" +
            "@typechecked\n" +
            "def add_two(*, x: int) -> int:\n" +
            "    \"\"\"Adds a value to an incoming number.\"\"\"\n" +
            "    return x + 2";
PyObject astTree = PythonAstRetriever.getPythonAst(pythonCode);

However, that yields error:

Error

PythonAstRetriever.java:34: error: incompatible types: PyObject cannot be converted to PyObject[]
    return astModule.invoke("parse", invokeArg, Py.None, Py.None);
                                                  ^
Note: Some messages have been simplified; recompile with -Xdiags:verbose to get full output

Full Stacktrace

In response to the comments, below is the full stacktrace:

PythonAstRetriever.java:34: error: no suitable method found for invoke(String,PyObject,PyObject,PyObject)
    return astModule.invoke("parse", invokeArg, Py.None, Py.None);
                    ^
    method PyObject.invoke(String,PyObject[],String[]) is not applicable
      (actual and formal argument lists differ in length)
    method PyObject.invoke(String,PyObject[]) is not applicable
      (actual and formal argument lists differ in length)
    method PyObject.invoke(String) is not applicable
      (actual and formal argument lists differ in length)
    method PyObject.invoke(String,PyObject) is not applicable
      (actual and formal argument lists differ in length)
    method PyObject.invoke(String,PyObject,PyObject) is not applicable
      (actual and formal argument lists differ in length)
    method PyObject.invoke(String,PyObject,PyObject[],String[]) is not applicable
      (argument mismatch; PyObject cannot be converted to PyObject[])
1 error

FAILURE: Build failed with an exception.

XY-problem

In response to the comments, the XY-problem is a bot that modifies code: changes or writes docstrings, function documentations and/or function comments, and writes tests for those functions. I would like to perform a separate modification/creation per modular component of the code of a file. So instead of writing a regex, or a manual Python code parser, I assumed using the AST could be an effective strategy to obtain the code components in a hierarchical and modular fashion.


Solution

  • Scope

    The syntax error, on the Py.None argument was resolved. However, it seems to me that converting an AST back into python code is non-trivial. Hence, this is not an answer to the XY-problem.

    Syntax Error Solution

    This code resolves the syntax error:

    package com.doctestbot.cli;
    
    import java.io.IOException;
    import java.nio.file.Files;
    import java.nio.file.Path;
    import java.nio.file.Paths;
    import org.python.core.PyObject;
    import org.python.core.PyString;
    import org.python.util.PythonInterpreter;
    
    /**
     * A class to retrieve the Python abstract syntax tree using Jython. This is a utility class,
     * meaning one only calls its method, and one does not instantiate the object.
     */
    public final class PythonAstRetriever {
      /**
       * Retrieves the Python abstract syntax tree for the given Python code.
       *
       * @param pythonCode The Python code for which to retrieve the AST.
       * @return The Python abstract syntax tree as a PyObject.
       */
      @SuppressWarnings({"PMD.LawOfDemeter"})
      public static PyObject getPythonAst(String pythonCode) {
        // Create a PythonInterpreter
        PythonInterpreter interpreter = new PythonInterpreter();
        System.out.println("pythonCode" + pythonCode);
    
        // Import the ast module
        interpreter.exec("import ast");
    
        // Parse the Python code and generate the AST
        PyObject invokeArg = new PyString(pythonCode);
        PyObject astModule = interpreter.get("ast");
        PyObject parseFunction = astModule.__getattr__("parse");
    
        // Return object
        return parseFunction.__call__(invokeArg);
      }
    
      @SuppressWarnings({"PMD.LawOfDemeter"})
      public static String pythonAstToString(PyObject pythonModule) {
        // Initialise Python code and imports.
        PythonInterpreter interpreter = new PythonInterpreter();
        interpreter.exec("import ast");
        PyObject astModule = interpreter.get("ast");
        // PyObject compileFunction = astModule.__getattr__("compile");
    
        // Get a string representation of the AST
        PyObject dumpFunction = astModule.__getattr__("dump");
        PyObject astDump = dumpFunction.__call__(pythonModule);
    
        // PyObject compiledCode = compileFunction.__call__(pythonModule, Py.None, Py.None, Py.None);
    
        // Get the code as a string
        String generatedCode = astDump.toString();
        System.out.println("generatedCode" + generatedCode);
    
        return generatedCode;
      }
    
      // Parse the Python code and generate the AST
      // PyObject invokeArg = new PyString(pythonCode);
    
      // return astModule.invoke("parse", invokeArg, Py.None, Py.None);
      // return (PyObject[]) astModule.invoke("parse", invokeArg, Py.None, Py.None);
    
      // }
    
      /**
       * Reads the content of a Python code file from the specified file path.
       *
       * @param filePath The path to the Python code file to read.
       * @return The content of the Python code file as a string.
       * @throws IOException If an I/O error occurs while reading the file.
       */
      public static String readPythonCodeFromFile(String filePath) throws IOException {
        Path path = Paths.get(filePath);
        return Files.readString(path);
      }
    
      // Private constructor to prevent instantiation of the utility class.
      private PythonAstRetriever() {
        throw new AssertionError("PythonAstRetriever class should not be instantiated.");
      }
    }
    
    

    Test File

    Which was tested with the following test file:

    package com.doctestbot;
    
    import static org.junit.jupiter.api.Assertions.assertEquals;
    import static org.junit.jupiter.api.Assertions.assertNotNull;
    
    import com.doctestbot.cli.Constants;
    import com.doctestbot.cli.PythonAstRetriever;
    import com.doctestbot.cli.SubmoduleManager;
    import java.io.IOException;
    import org.junit.jupiter.api.BeforeAll;
    import org.junit.jupiter.api.Test;
    import org.python.core.PyObject;
    
    /**
     * Test scenarios for parsing and rewriting a Python file.
     *
     * <p>The following scenarios are tested:
     *
     * <pre>
     * * Tests a Python file with:
     *   - methods
     *   - documentation + methods
     *   - docstring, documentation + methods
     *
     * * class
     *   - documentation + class
     *   - docstring + documentation + class
     *
     * * class + classmethods
     *   - documentation + class + classmethods
     *   - docstring + documentation + class + classmethods
     *
     * * class + methods
     *   - documentation + class + methods
     *   - docstring + documentation + class + methods
     *
     * * class + classmethods + methods
     *   - documentation + class + classmethods + methods
     *   - docstring + documentation + class + classmethods + methods
     *
     * * gets parsed and rewritten correctly.
     * </pre>
     */
    @SuppressWarnings({"PMD.AtLeastOneConstructor"})
    public class TestPythonParsing {
    
      @BeforeAll
      public static void setupOnce() {
        SubmoduleManager.checkoutTestRepoBranch(
            "test-parsing", "854f5ccb7954350b51d02532295c05b65fbdc6d8");
      }
    
      /**
       * Tests the addition operation. It verifies that adding two positive integers results in the
       * correct sum.
       */
      @Test
      void testAddition() {
        int result = 3 + 5;
        assertEquals(8, result, "Addition operation should yield the sum of two numbers.");
        assertNotNull(result, "msg");
      }
    
      /** Tests parsing and recreating a Python file with only methods. */
      @Test
      @SuppressWarnings({"PMD.LawOfDemeter"})
      public void testParseAndRecreateMethodsOnly() throws IOException {
        // Path to the Python code file
        String filePath = Constants.testRepoPath + "/src/pythontemplate/methods.py";
    
        // Read Python code from the file
        String pythonCode = PythonAstRetriever.readPythonCodeFromFile(filePath);
    
        // Parse the Python code
        PyObject astTree = PythonAstRetriever.getPythonAst(pythonCode);
    
        PythonAstRetriever.pythonAstToString(astTree);
    
        // Recreate the Python code from the AST
        String recreatedCode = astTree.toString();
    
        System.out.println("recreatedCode" + recreatedCode);
    
        // Assert the parsed and recreated code match
        assertEquals(pythonCode, recreatedCode, "Parsed and recreated code should match");
      }
    }