Search code examples
javajava-native-interfacefork

Java Native Interface sneaky forking behavior


After a very long hunt and for a related bug, I came to this strange behavior:

If on Linux I run a single JNI method to do a select:

JNIEXPORT void JNICALL Java_SelectJNI_select(JNIEnv *env, jobject thisObj) {
  // Print the curerent PID
  fprintf(stderr, "PID: %d\n", getpid());

  // Wait for 30 seconds
  struct timeval *timeout = (struct timeval *) calloc(1, sizeof(struct timeval));
  timeout->tv_sec = 30;
  timeout->tv_usec = 0;
  select(0, NULL, NULL, NULL, timeout);

  return;
}

and then I run the executable with strace, the select is not executed with the PID I have printed, but with the PID of a child, with the original object actually waiting on a mutex (this doesn't happen if I execute the same call in a plain small C program).

Say strace -f -o strace_output.txt java SelectJNI prints:

PID: 46811 

then grep select\( strace_output.txt will return:

46812 select(0, NULL, NULL, NULL, {tv_sec=30, tv_usec=0} <unfinished ...>

My guess is that JNI is forking and, in some way replacing the original select with its own wrapped version, probably to remain responsive.

I have a lot of questions, but the ones I care more about are:

  1. Is my hypothesis correct? JNI replacing functions under my feet?
  2. Is this behavior documented somewhere?
  3. The process where the actual select is invoked seems always to be that of the first child. Can I rely on that? If not, how do I find out where select is actually running?

Solution

  • The JVM may indeed fork, but it does so to create new JVM threads, rather than whole processes. While 46811 is the PID, the thread that's actually running your code in question has TID 46812 (which is what strace prints), while still running under PID 46811. Replacing getpid with gettid in the sample should lead to a consistent output.