After a very long hunt and for a related bug, I came to this strange behavior:
If on Linux I run a single JNI method to do a select
:
JNIEXPORT void JNICALL Java_SelectJNI_select(JNIEnv *env, jobject thisObj) {
// Print the curerent PID
fprintf(stderr, "PID: %d\n", getpid());
// Wait for 30 seconds
struct timeval *timeout = (struct timeval *) calloc(1, sizeof(struct timeval));
timeout->tv_sec = 30;
timeout->tv_usec = 0;
select(0, NULL, NULL, NULL, timeout);
return;
}
and then I run the executable with strace, the select
is not executed with the PID I have printed, but with the PID of a child, with the original object actually waiting on a mutex (this doesn't happen if I execute the same call in a plain small C program).
Say strace -f -o strace_output.txt java SelectJNI
prints:
PID: 46811
then grep select\( strace_output.txt
will return:
46812 select(0, NULL, NULL, NULL, {tv_sec=30, tv_usec=0} <unfinished ...>
My guess is that JNI is forking and, in some way replacing the original select with its own wrapped version, probably to remain responsive.
I have a lot of questions, but the ones I care more about are:
select
is actually running?The JVM may indeed fork, but it does so to create new JVM threads, rather than whole processes. While 46811 is the PID, the thread that's actually running your code in question has TID 46812 (which is what strace prints), while still running under PID 46811. Replacing getpid
with gettid
in the sample should lead to a consistent output.