Search code examples
javampiintel-mpi

Intel MPI mpirun does not terminate using java Process.destroy()


My Intel MPI version is impi/5.0.2.044/intel64 installed on a RHEL machine.

I am using java to invoke an MPI program using the following code:

ProcessBuilder builder = new ProcessBuilder();
builder.command("mpirun ./myProgram");
builder.redirectError(Redirect.to(new File("stderr")));
builder.redirectOutput(Redirect.to(new File("stdout")));
Process p = null;
try {
    p = builder.start();
} catch (IOException e) {
    e.printStackTrace();
}
// Process has started here
p.destroy();
try {
    // i = 143
    int i = p.exitValue();
} catch( IllegalThreadStateException e){
}

But even after the exitValue() is known without throwing exception, ps aux still shows a bunch of ./myProgram, and the program is still writing result files as if it is not being killed, terminating only after it finishes all its calculation.

Currently, the only way I find successful to terminate ./myProgram is to terminate the java using Ctrl+C in the console to the java program.

My intention is to stop the calculation immediately and let the java program schedule some other calculation. Is there any walkaround to force all mpi instances to terminate, or at least guarantee a termination in small, definite amount of time (e.g. 30s or 1 min of polling)?


Solution

  • The problem is that the JDK implementation of destroy sends SIGTERM, which shuts down mpirun hard. See here for the relevant JDK source.

    You need to send SIGINT to give MPI a chance to shut down gracefully.

    E.g. Runtime.getRuntime().exec("kill -9 <pid>");

    You can get the PID by invoking mpirun with --report-pid. (read the man-page)

    edit

    You can alternatively use reflection to figure out the PID of a process you started under a UNIX-like OS (stolen from here). As we are talking about kill and signal, that should not be a restriction.

    if(process.getClass().getName().equals("java.lang.UNIXProcess")) {
      /* get the PID on unix/linux systems */
      try {
        Field f = process.getClass().getDeclaredField("pid");
        f.setAccessible(true);
        pid = f.getInt(p);
      } catch (Throwable e) {
      }
    }