I have a Python script that contains a for-loop that iterates through a list of items. I need to perform a computation on a property of each item, but the code that does this computation is in Java (where the java main()
method accepts two arguments: arg1
and arg2
in the example below). So far, so good -- I can use subprocess
to call Java.
This is how I do it currently (simplified):
from subprocess import Popen, PIPE
cp = ... # my classpath string
java_file = ... # the file with the java code
arg1 = ... # an argument string (always the same value)
items = [...] # my list of items
for item in items:
args2 = ... # calculated from item inside the python script
cmd = ['java', '-cp', cp, java_file, arg1, arg2]
process = Popen(cmd, stdout=PIPE, stderr=PIPE, shell=True)
output, errors = process.communicate()
outp_str = output.decode('utf-8') # the result I need
It works, but because my list can contain thousands of elements, I'd be calling subprocess
as many times -- which seems very inefficient.
Is there a way in which I can call subprocess
only once, before the loop, and then give the active subprocess the necessary command within the loop? Or would that make no sense in terms of speed/efficiency?
I found this question, which seems to be related -- but I can't manage to translate this to my scenario. I also did not find my solution in the docs for subprocess. I imagine it would be something like this:
cp = ... # my classpath string
java_file = ... # the file with the java code
arg1 = ... # an argument string (always the same value)
cmd = [...] # <-- ???
process = Popen(cmd, stdout=PIPE, stderr=PIPE, shell=True)
items = [...] # my list of items
for item in items:
args2 = ... # calculated from item inside the python script
process.stdin.write(bytes(..., 'utf-8')) # <-- ???
process.stdin.flush()
result = process.stdout.readline() # the result I need
... where I can't figure out what the two commands should be (in the lines that have the question marks).
Is what I want possible? Any help much appreciated!
Whether or not you can make your Python code more "efficient" depends on how the Java application is implemented. If the Java application can only receive input via command line arguments, then there's nothing you can do. You'll have to launch a new subprocess for every pair of arguments. But if your Java application is implemented to read from its standard input, then you can write to stdin
from the Python side. What that would look like depends on the protocol decided by the Java application.
You also ask what the Java command would look like if you can write to standard input. The command to launch the Java application is the same. What may be different is what command line arguments, if any, you need to pass to the Java application. And that again depends on how the Java application is implemented.
Note it would be cheaper to reuse the same subprocess rather than launching a new one for each pair of arguments. Especially with Java's relatively high boot time. But whether or not you can do this depends on the Java application.
If you control the Java application, you can modify it to read its standard input. Though you may also want to consider simply having the Java application accept more than two command line arguments (e.g., if the number of command line arguments is 40, then process them as if it was given 20 pairs).
Here's an example Java application that can switch between "processing" two command line arguments and "processing" an unknown number of argument pairs from standard input.
package sample;
import java.nio.charset.StandardCharsets;
import java.util.Scanner;
public class Main {
public static void main(String[] args) {
if (args.length == 1 && args[0].equals("--use-stdin")) {
processArgsFromStandardInput();
} else if (args.length == 2) {
processArgs(args[0], args[1]);
} else {
System.err.println("Illegal command line. Must be --use-stdin or 2 arguments.");
System.exit(1);
}
}
static void processArgsFromStandardInput() {
Scanner scanner = new Scanner(System.in, StandardCharsets.UTF_8);
scanner.useDelimiter(",");
while (scanner.hasNext()) {
String arg1 = scanner.next();
String arg2 = scanner.next();
processArgs(arg1, arg2);
}
}
static void processArgs(String arg1, String arg2) {
System.out.printf("Processing args: %s, %s%n", arg1, arg2);
}
}
I chose to use a Scanner
, but you can use whatever you want (e.g., BufferedReader
, DataInputStream
, etc.). The important part is that the source of data is System.in
(standard input). I also chose to use ","
as the delimiter between arguments. Again, that was an arbitrary choice, and you can use whatever you want. Though it means the arguments can't contain commas themselves (I don't provide a way to "escape" a comma). Note using UTF-8 encoding and commas as the delimiter is the "protocol" I mentioned earlier.
And here's an example Python script that invokes the Java application (compiled and packaged into a JAR file) twice, once for each "mode":
import subprocess
import sys
from subprocess import Popen, PIPE
from time import time
def measure_time(func):
def wrapper(*args):
start = time()
func(*args)
end = time()
print(f'Function took {end - start:.2f} seconds.')
return wrapper
# implementation from https://stackoverflow.com/a/5389547/6395627
def pairwise(iterable):
a = iter(iterable)
return zip(a, a)
@measure_time
def invoke_args(jarfile, args):
for arg1, arg2 in pairwise(args):
subprocess.run(['java', '-jar', jarfile, arg1, arg2])
@measure_time
def invoke_stdin(jarfile, args):
with Popen(['java', '-jar', jarfile, '--use-stdin'], stdin=PIPE) as proc:
for arg1, arg2 in pairwise(args):
proc.stdin.write(f'{arg1},{arg2},'.encode())
if __name__ == '__main__':
jarfile = sys.argv[1]
args = [f'arg{i}' for i in range(1, 41)]
print('========== COMMAND LINE ARGS ==========')
invoke_args(jarfile, args)
print()
print('========= STANDARD INPUT ==========')
invoke_stdin(jarfile, args)
print()
If you invoke the above Python script (passing an appropriate JAR file path and assuming you have Java on your path), then you should see output similar to:
========== COMMAND LINE ARGS ==========
Processing args: arg1, arg2
Processing args: arg3, arg4
Processing args: arg5, arg6
Processing args: arg7, arg8
Processing args: arg9, arg10
Processing args: arg11, arg12
Processing args: arg13, arg14
Processing args: arg15, arg16
Processing args: arg17, arg18
Processing args: arg19, arg20
Processing args: arg21, arg22
Processing args: arg23, arg24
Processing args: arg25, arg26
Processing args: arg27, arg28
Processing args: arg29, arg30
Processing args: arg31, arg32
Processing args: arg33, arg34
Processing args: arg35, arg36
Processing args: arg37, arg38
Processing args: arg39, arg40
Function took 2.46 seconds.
========= STANDARD INPUT ==========
Processing args: arg1, arg2
Processing args: arg3, arg4
Processing args: arg5, arg6
Processing args: arg7, arg8
Processing args: arg9, arg10
Processing args: arg11, arg12
Processing args: arg13, arg14
Processing args: arg15, arg16
Processing args: arg17, arg18
Processing args: arg19, arg20
Processing args: arg21, arg22
Processing args: arg23, arg24
Processing args: arg25, arg26
Processing args: arg27, arg28
Processing args: arg29, arg30
Processing args: arg31, arg32
Processing args: arg33, arg34
Processing args: arg35, arg36
Processing args: arg37, arg38
Processing args: arg39, arg40
Function took 0.16 seconds.