Search code examples
javajunitgraalvm

Boost JUnit performance using GraalVM


Have read some high level articles about new GraalVM and thought it would be a good idea to use it for enhancing JUnit test performance, especially for big test suites which run in forked mode.

According to SO question "Does GraalVM JVM support java 11?" I added following to the VM arguments of an unit test run configuration in my eclipse (jee-2019-12, JUnit4):

-XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler

Effect: The unit test takes somewhat longer than without these switches (2800 ms with, 2200 ms without, reproduceable).

Did I miss something? Or did I misunderstand the promises of enhanced boot time in GraalVM?


Solution

  • Yes, unfortunately, it feels like there's some misunderstanding at play here. I'll try to elaborate on a number of issue regarding performance and GraalVM here.

    GraalVM is a polyglot runtime and it can run JVM applications, normally it does so by running Java HotSpot VM (the same as OpenJDK JDK, for example) with the top tier optimizing just-in-time (JIT) compiler replaced with its own GraalVM compiler. Simplifying somewhat, during the run, the JVM loads the class files, verifies them, starts interpreting them, then compiles them with a series of compilers which tend to go from the fastest to compile to the most optimizing -- so the longer your application runs and more you use the same methods -- they get progressively compiled to a better and better machine code.

    The GraalVM compiler is really good at optimizing code, so when your application runs enough time and it gets to work, the result is usually better than other compilers can show. This leads to better peak-performance, which is great for your medium/long-running workloads.

    Your unit test run takes 2 seconds, which is really not that much time to execute code a lot, gather profile and use the optimizing compiler. It might also be that the particular code patterns and the workload is really well suited for C2 (default HotSpot's top tier JIT) so it's hard to be better. Remember, C2 is an excellent JIT which is developed for at least two decades and its results are really-really good too.

    Now there's also another option that GraalVM gives you -- GraalVM native images, which allows you to compile your code ahead of time (AOT) to the native binary that will not depend on the JVM and will not load classes, verify them, initialize them, so the startup of such binary until it gets to do useful "business" work is much better. This is a very interesting option for shorter running workloads or resource constrained environments (the binary doesn't need to do JIT compilation, so it doesn't need resources for it, making runtime resource consumption smaller). However to use this approach you need to compile your application with the native image utility from GraalVM and it can take longer than your workload that runs in 2 seconds.

    Now, in the setup that you're describing, you're not using the GraalVM distribution, but enable the GraalVM compiler in your OpenJDK (I assume) distribution. The options you specify turn on the GraalVM compiler as the top tier JIT compiler. There are 2 main differences at play compared to what you'd get with running java from the GraalVM distribution:

    • The compiler is not up-to-date, at some point of time the GraalVM compiler sources are pulled into the OpenJDK project and that's how it ends in your distribution.
    • GraalVM compiler is written in Java, and in your setup it is executed as normal Java code so it first might need to JIT compile itself which leads to longer warm-up phase of the run, somewhat polluted JIT profile with its code, etc.

    In the GraalVM distribution, which I'd encourage you to try for this experiment -- the GraalVM compiler is up-to-date and is by default precompiled as a shared library using the GraalVM native image technology, so at runtime it doesn't need be JIT compiled, so its warmup is much more similar to the characteristics of C2.

    Still, 2 seconds might not be enough time for the optimizing compiler to show major differences. Could also be that tests run a lot of code once, and the body of the hot code which is JIT compiled is not significant enough.