Search code examples
javapythonsubprocessjythonjpype

What is the best way to run the same java function repeatedly in python?


I am doing a project that requires me to repeatedly run a java function in python (it's like designing a learning algorithm in python but the value function was provided in java)

So what would be the practice for this scenario? Shall I use subprocess.run() to call the java function every time or shall I use the things like Py4J, Jython or JPype? What's the difference between using subprocess.run() and the others?

The efficiency should be the top concern since I need to run the same java function repeatedly.


Solution

    • Using subprocess has two problems. If neither one is relevant, it'll work fine.
      • If you're sending large amounts of data back and forth, you have to serialize it in some format to pass in via files and command-line arguments, or pipes or sockets, which can be slow.
      • If you're calling a whole lot of short functions instead of one occasional huge one, you'll be spending more time setting up and tearing down the JVM (and warming up the JIT) than doing actual work.
    • Jython has two problems. Again, if neither one affects you, it'll work fine.
      • It can't use many popular third-party libraries because they're built in C, for CPython.
      • It's out of date. The latest version implements Python 2.7, which is less than 2 years away from going out of support.
    • JPype has one problem, but it's a doozy. If the current fork does what you need and has no bugs blocking you, maybe it's ok anyway.
      • It's a vaporware project abandoned over a decade ago. It was picked up and knocked into shape by someone else a few years ago, and the current maintainer is keeping it running, and occasionally gets patches for things like working in 64-bit cygwin or updating to OS X 10.9, but it's not exactly a vibrant project with major support behind it.
    • Py4J has two problems.
      • It's incomplete. Not unusuable, and not completely moribund, but there hasn't been any visible work on it in over a year, and nobody seems interested in anything but the minimal functionality needed for Apache Spark.
      • It's doing the same kind of serialization you'd do with subprocess behind your back, and more beyond that for every call you make, and the FAQ justifies this by saying performance is not a priority. (Spark just ignores all of that and uses its own channels for everything.)
      • For more minimal use—just starting up a JVM and setting up a socket to it—it may be better than subprocess because you don't have to keep starting and tearing down a JVM, but writing a socket protocol on both sides is a little bit more work than storing files and passing filenames on the command line. (Not a huge hurdle, but a problem if you've never done this kind of thing before.)
    • You may also want to look at transpilers. I don't know much about any of them, but I've talked to people who are using BeeWare to compile Python 3.4 code to Java source code that they then build together with their native Java code. I'm pretty sure this won't work if you're using any C extension, but if that's not a problem for you, it might be worth considering.