How can one read an arbitrary Python file, build an abstract syntax tree from it, modify that, and then write the modified AST back to file, in Java? (Small note, for a concrete syntax tree (which includes spacing comments etc), one could call this pip package from Java.)
I tried the following method to first read the Python code to generate the abstract syntax tree (AST):
package com.doctestbot.cli;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import org.python.core.Py;
import org.python.core.PyObject;
import org.python.core.PyString;
import org.python.util.PythonInterpreter;
/**
* A class to retrieve the Python abstract syntax tree using Jython. This is a utility class,
* meaning one only calls its method, and one does not instantiate the object.
*/
public final class PythonAstRetriever {
/**
* Retrieves the Python abstract syntax tree for the given Python code.
*
* @param pythonCode The Python code for which to retrieve the AST.
* @return The Python abstract syntax tree as a PyObject.
*/
@SuppressWarnings({"PMD.LawOfDemeter"})
public static PyObject getPythonAst(String pythonCode) {
// Create a PythonInterpreter
PythonInterpreter interpreter = new PythonInterpreter();
// Access the "ast" module from Python
PyObject astModule = interpreter.get("ast");
// Parse the Python code and generate the AST
PyObject invokeArg = new PyString(pythonCode);
return astModule.invoke("parse", invokeArg, Py.None, Py.None);
}
/**
* Reads the content of a Python code file from the specified file path.
*
* @param filePath The path to the Python code file to read.
* @return The content of the Python code file as a string.
* @throws IOException If an I/O error occurs while reading the file.
*/
public static String readPythonCodeFromFile(String filePath) throws IOException {
Path path = Paths.get(filePath);
return Files.readString(path);
}
// Private constructor to prevent instantiation of the utility class.
private PythonAstRetriever() {
throw new AssertionError("PythonAstRetriever class should not be instantiated.");
}
}
However, when I run it with:
String pythonCode =
"\"\"\"Example python file with a function.\"\"\"\n" +
"\n" +
"from typeguard import typechecked\n" +
"\n" +
"@typechecked\n" +
"def add_two(*, x: int) -> int:\n" +
" \"\"\"Adds a value to an incoming number.\"\"\"\n" +
" return x + 2";
PyObject astTree = PythonAstRetriever.getPythonAst(pythonCode);
However, that yields error:
PythonAstRetriever.java:34: error: incompatible types: PyObject cannot be converted to PyObject[]
return astModule.invoke("parse", invokeArg, Py.None, Py.None);
^
Note: Some messages have been simplified; recompile with -Xdiags:verbose to get full output
In response to the comments, below is the full stacktrace:
PythonAstRetriever.java:34: error: no suitable method found for invoke(String,PyObject,PyObject,PyObject)
return astModule.invoke("parse", invokeArg, Py.None, Py.None);
^
method PyObject.invoke(String,PyObject[],String[]) is not applicable
(actual and formal argument lists differ in length)
method PyObject.invoke(String,PyObject[]) is not applicable
(actual and formal argument lists differ in length)
method PyObject.invoke(String) is not applicable
(actual and formal argument lists differ in length)
method PyObject.invoke(String,PyObject) is not applicable
(actual and formal argument lists differ in length)
method PyObject.invoke(String,PyObject,PyObject) is not applicable
(actual and formal argument lists differ in length)
method PyObject.invoke(String,PyObject,PyObject[],String[]) is not applicable
(argument mismatch; PyObject cannot be converted to PyObject[])
1 error
FAILURE: Build failed with an exception.
In response to the comments, the XY-problem is a bot that modifies code: changes or writes docstrings, function documentations and/or function comments, and writes tests for those functions. I would like to perform a separate modification/creation per modular component of the code of a file. So instead of writing a regex, or a manual Python code parser, I assumed using the AST could be an effective strategy to obtain the code components in a hierarchical and modular fashion.
The syntax error, on the Py.None
argument was resolved. However, it seems to me that converting an AST back into python code is non-trivial. Hence, this is not an answer to the XY-problem.
This code resolves the syntax error:
package com.doctestbot.cli;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import org.python.core.PyObject;
import org.python.core.PyString;
import org.python.util.PythonInterpreter;
/**
* A class to retrieve the Python abstract syntax tree using Jython. This is a utility class,
* meaning one only calls its method, and one does not instantiate the object.
*/
public final class PythonAstRetriever {
/**
* Retrieves the Python abstract syntax tree for the given Python code.
*
* @param pythonCode The Python code for which to retrieve the AST.
* @return The Python abstract syntax tree as a PyObject.
*/
@SuppressWarnings({"PMD.LawOfDemeter"})
public static PyObject getPythonAst(String pythonCode) {
// Create a PythonInterpreter
PythonInterpreter interpreter = new PythonInterpreter();
System.out.println("pythonCode" + pythonCode);
// Import the ast module
interpreter.exec("import ast");
// Parse the Python code and generate the AST
PyObject invokeArg = new PyString(pythonCode);
PyObject astModule = interpreter.get("ast");
PyObject parseFunction = astModule.__getattr__("parse");
// Return object
return parseFunction.__call__(invokeArg);
}
@SuppressWarnings({"PMD.LawOfDemeter"})
public static String pythonAstToString(PyObject pythonModule) {
// Initialise Python code and imports.
PythonInterpreter interpreter = new PythonInterpreter();
interpreter.exec("import ast");
PyObject astModule = interpreter.get("ast");
// PyObject compileFunction = astModule.__getattr__("compile");
// Get a string representation of the AST
PyObject dumpFunction = astModule.__getattr__("dump");
PyObject astDump = dumpFunction.__call__(pythonModule);
// PyObject compiledCode = compileFunction.__call__(pythonModule, Py.None, Py.None, Py.None);
// Get the code as a string
String generatedCode = astDump.toString();
System.out.println("generatedCode" + generatedCode);
return generatedCode;
}
// Parse the Python code and generate the AST
// PyObject invokeArg = new PyString(pythonCode);
// return astModule.invoke("parse", invokeArg, Py.None, Py.None);
// return (PyObject[]) astModule.invoke("parse", invokeArg, Py.None, Py.None);
// }
/**
* Reads the content of a Python code file from the specified file path.
*
* @param filePath The path to the Python code file to read.
* @return The content of the Python code file as a string.
* @throws IOException If an I/O error occurs while reading the file.
*/
public static String readPythonCodeFromFile(String filePath) throws IOException {
Path path = Paths.get(filePath);
return Files.readString(path);
}
// Private constructor to prevent instantiation of the utility class.
private PythonAstRetriever() {
throw new AssertionError("PythonAstRetriever class should not be instantiated.");
}
}
Which was tested with the following test file:
package com.doctestbot;
import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertNotNull;
import com.doctestbot.cli.Constants;
import com.doctestbot.cli.PythonAstRetriever;
import com.doctestbot.cli.SubmoduleManager;
import java.io.IOException;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;
import org.python.core.PyObject;
/**
* Test scenarios for parsing and rewriting a Python file.
*
* <p>The following scenarios are tested:
*
* <pre>
* * Tests a Python file with:
* - methods
* - documentation + methods
* - docstring, documentation + methods
*
* * class
* - documentation + class
* - docstring + documentation + class
*
* * class + classmethods
* - documentation + class + classmethods
* - docstring + documentation + class + classmethods
*
* * class + methods
* - documentation + class + methods
* - docstring + documentation + class + methods
*
* * class + classmethods + methods
* - documentation + class + classmethods + methods
* - docstring + documentation + class + classmethods + methods
*
* * gets parsed and rewritten correctly.
* </pre>
*/
@SuppressWarnings({"PMD.AtLeastOneConstructor"})
public class TestPythonParsing {
@BeforeAll
public static void setupOnce() {
SubmoduleManager.checkoutTestRepoBranch(
"test-parsing", "854f5ccb7954350b51d02532295c05b65fbdc6d8");
}
/**
* Tests the addition operation. It verifies that adding two positive integers results in the
* correct sum.
*/
@Test
void testAddition() {
int result = 3 + 5;
assertEquals(8, result, "Addition operation should yield the sum of two numbers.");
assertNotNull(result, "msg");
}
/** Tests parsing and recreating a Python file with only methods. */
@Test
@SuppressWarnings({"PMD.LawOfDemeter"})
public void testParseAndRecreateMethodsOnly() throws IOException {
// Path to the Python code file
String filePath = Constants.testRepoPath + "/src/pythontemplate/methods.py";
// Read Python code from the file
String pythonCode = PythonAstRetriever.readPythonCodeFromFile(filePath);
// Parse the Python code
PyObject astTree = PythonAstRetriever.getPythonAst(pythonCode);
PythonAstRetriever.pythonAstToString(astTree);
// Recreate the Python code from the AST
String recreatedCode = astTree.toString();
System.out.println("recreatedCode" + recreatedCode);
// Assert the parsed and recreated code match
assertEquals(pythonCode, recreatedCode, "Parsed and recreated code should match");
}
}