I have a Java program that needs to process a long series of input strings. To do this it goes through each string, passes it to a Process
(a Python script), gets the result from the Process
's OutputStream
, then moves to the next string. However I find that after running for a number of hours the program freezes with the Java program waiting for an output from Python.
To debug I made a simpler version of my program that uses small strings, doesn't do any buffering on the Java side and doesn't modify the data in the Python script. But now I find it's freezing in a different place, with Java trying to flush data to the Python script and the Python script trying to flush a result to Java. I find the number of items its able to process before freezing varies slightly between runs of the program and increasing the length of the strings greatly decreases the amount of items that can be processed.
Java program:
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import static java.nio.charset.StandardCharsets.UTF_8;
import static java.util.concurrent.TimeUnit.SECONDS;
public class Main {
public static void main(String[] args) throws IOException, InterruptedException {
Process process = start("python", "test.py");
for (int i = 0; i < 1000; i++) {
System.out.println(i);
processText("test string test string test string test string ", process);
}
process.getOutputStream().close();
boolean finished = process.waitFor(10, SECONDS);
if (!finished) {
process.destroyForcibly();
}
}
public static Process start(String... command) throws IOException {
ProcessBuilder processBuilder = new ProcessBuilder(command);
processBuilder.redirectError(ProcessBuilder.Redirect.INHERIT);
return processBuilder.start();
}
public static String processText(String text, Process process) throws IOException {
byte[] bytes = (text + "\n").getBytes(UTF_8);
OutputStream outputStream = process.getOutputStream();
System.out.println("Writing...");
outputStream.write(bytes);
System.out.println("Done!");
outputStream.flush();
System.out.println("Reading...");
String result = readLn(process);
System.out.println("Got it!");
return result;
}
public static String readLn(Process process) throws IOException {
InputStream inputStream = process.getInputStream();
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
byte newlineByte = "\n".getBytes(UTF_8)[0];
byte lastByte = -1;
while (lastByte != newlineByte) {
lastByte = (byte) inputStream.read();
byteArrayOutputStream.write(lastByte);
}
return byteArrayOutputStream.toString(UTF_8);
}
}
Python script:
import sys
import io
in_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')
out_stream = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
def output(s):
out_stream.write(s)
text = in_stream.readline()
while text != "":
print("Outputting result", file=sys.stderr)
output(text)
print("Output done!", file=sys.stderr)
output("\n")
print("Flushing", file=sys.stderr)
out_stream.flush()
print("Flushed", file=sys.stderr)
text = in_stream.readline()
Output from Java:
0
Writing...
Done!
Reading...
Got it!
1
Writing...
Done!
Reading...
Got it!
.
.
.
379
Writing...
Done!
Reading...
Got it!
380
Writing...
Done! [Freezes here]
Output from Python (via stderr):
Outputting result
Output done!
Flushing
Flushed
.
.
.
Outputting result
Output done!
Flushing
Flushed
Outputting result
Output done!
Flushing [Freezes here]
When I force the Java program to stop I get this additional output from Python's stderr:
Flushed
Outputting result
Output done!
Flushing
Traceback (most recent call last):
File "...\test.py", line 17, in <module>
out_stream.flush()
OSError: [Errno 22] Invalid argument
Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='cp1252'>
OSError: [Errno 22] Invalid argument
If I use print()
and input()
instead of in_stream and out_stream I manage to get through all 1000 items. However I want to ensure UTF-8 encoding when passing data between Java and Python so I can have all Unicode characters and won't lose any data. That's why I'm using TextIOWrapper
based on what I read online (I figured this would be the most efficient approach for large amounts of data). Although this last error output seems to be saying it's using cp1252 and not UTF-8? How do I fix that?
I'm using Windows 10, Java 17 and Python 3.10
Edit: I think the error is saying sys.stdout has cp1252 encoding; in_stream and out_stream say they have utf-8 encoding, so I think the encoding is fine. I can now fix the freezing just by setting stdin
to a UTF-8 TextIOWrapper
and using input()
(stdin.readline()
doesn't work, it freezes). But I don't know why it wouldn't work the original way and why this fixes the problem. If anyone can explain and outline how to avoid this kind of problem in the future (could out_stream
also potentially cause freezing?) I will accept their answer.
import sys
import io
sys.stdin = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')
out_stream = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
def output(str):
out_stream.write(str)
def read_text():
try:
return input()
except EOFError:
return ""
text = read_text()
while text != "":
print("Outputting result", file=sys.stderr)
output(text)
print("Output done!", file=sys.stderr)
output("\n")
print("Flushing", file=sys.stderr)
out_stream.flush()
print("Flushed", file=sys.stderr)
text = read_text()
The sub-process of ProcessBuilder
has three streams and the I/O between Java and sub-process gets really unhappy / freezes if one of those buffers is filled. If you change the Got It
line to print result it will show that the stream you are reading contains more than one line of input per line sent:
System.out.println("Got it! "+result);
Thus deadlock will occur (eventually) because your Java app which does one line write and one line read won't keep up with the Python output which is sending 2 lines back. The easy fix here which would work only for your test program would be to remove the duplication of the newline output("\n")
.
However in general with all ProcessBuilder
calls it would be possible for the sub-process to freeze because the Java code isn't reading STDOUT, and at same time Java code appears to freeze if writing to STDIN as sub-process is blocking on STDOUT. And vice versa. So never read and write in same thread, nor read from both STDOUT+STDERR in one thread.
The best fix is to use separate consumer / producer threads for STDIN/STDOUT/STDERR. As you've used INHERIT
mode for STDERR that should be handled OK, or use an new thread for STDERR, or merge with STDOUT by calling processBuilder.redirectErrorStream(true);
Split processText
into 2 methods: processText
that does outputStream.write
and no readLn
, and readText
that does readLn
part. Run processText
in a background thread and use readText
in the main.
After calling waitFor
make sure you join on the background thread to ensure it was finished.
Here is an example which launches process with background thread handling, you would need to adjust to deal with STDIN as own thread.
Also, PYTHON buffers output so you can avoid need for regular flushing by launching with python -u
to make Python output unbuffered.
This version will not freeze, whichever version of stdin you use in python because it separates the stdin write from stdout reads:
public static void main_py(String... args) throws IOException, InterruptedException {
Process process = start("python", "test.py");
// Handle STDOUT in different thread to STDIN
Runnable task = () -> {
System.out.println("task START");
try(var from = new BufferedReader(new InputStreamReader(process.getInputStream(), UTF_8))) {
String result = null;
while((result = from.readLine()) != null)
System.out.println("Got it! "+result);
} catch(IOException io) {
throw new UncheckedIOException(io);
}
System.out.println("task END");
};
Thread bg = new Thread(task, "STDERR");
bg.start();
for (int i = 0; i < 1000; i++) {
System.out.println(i);
processText2("test string test string test string test string "+i, process);
}
process.getOutputStream().close();
boolean finished = process.waitFor(10, SECONDS);
bg.join();
if (!finished) {
process.destroyForcibly();
}
System.out.println("main END");
}
public static void processText2(String text, Process process) throws IOException {
byte[] bytes = (text + "\n").getBytes(UTF_8);
OutputStream outputStream = process.getOutputStream();
System.out.println("Writing..."+text);
outputStream.write(bytes);
System.out.println("Done!");
outputStream.flush();
System.out.println("Reading...");
}