Search code examples
javasimdauto-vectorization

Java autovectorization


I'm trying to understand when JDK will autovectorize. I have the following set of questions (despite googling, reading, experimenting etc.). Given a simple loop as follows:

for(int i=0; size = size(); i < size; i++) {
   a[i] = b[i] * c[i];
   method1();
   // someObject.method2();
   // someHashMap.put(b[i], c[i]);
}
  1. Why is it necessary for the method call "method1" (that appears within the loop) to be inlined for autovectorization to ocurr? (I can't understand why that must be necessary....)
  2. Perhaps this a "silly" question, but what if "someObject.method2()" were uncommented. (And let's assume that method2 is huge method, ie many lines). Would that prevent autovectorization too? What if method2 were a tiny method (eg just 1 or 2 lines etc.?)
  3. What if the "someHashMap" line were uncommented? Would the fact that we have an object/variable that would be shared accross all the SIMD cause the autovectorization to fail too? (I can't see how it could work unless jdk would somehow insert a "syncronization" keyword automatically when accessing the common object/var of "someHashMap"
  4. It seems to me that the "streaming" interface would solve the problem implied in question #3 directly above, since the "collector" logic in streams would automatically take care of merging individual hashmaps and so we would not need any "synchronized" word. (And in general, it almost seems like the streaming API is a perfect API to allow jdk to automatically use autovectorization, so long as there are no "outside vars" (ie no side effects) when creating the streaming code...Does jdk/jit compiler automatically do autovectorization as a result when the code is written using the standard streaming interface? If not, wouldn't it make sense to do so (perhaps in a future jdk version or perhaps a jdk from some other vendor?)
  5. If the body of the loop contains many many if statements etc (lots of branching and let's say further that each branch does lots of computation), would that mean that a) autovectorization is probably a BAD idea (just as it would be for a GPU) and b) the jit compiler is smart enough to determine that autovectorization is a bad idea and so it won't autovectorize?
  6. I am currently using Oracle jdk8, but do the answers change above if one uses jdk9 or jdk10, etc.?

Solution

  • To answer your question (1), in principle, a Java compiler could optimize in the presence of a non-inlined method1() call, if it analyzed method1() and determined that it doesn't have any side-effects that would affect the auto-vectorization. In particular, the compiler could prove that the method was "const" (no side effects and no reads from global memory) which in general would enable many optimizations at the call site without inlining. It could also perhaps prove more restricted properties, such as not reading or writing to arrays of a certain type, which would also be enough to allow auto-vectorization to proceed in this case.

    In practice, however, I am not aware of any Java compiler that can do this optimization today. If this answer is to believed, in Hotspot: "a [not-inlined] method call is typically opaque for JIT compiler." Most Java compilers are based in one way or another on Hotspot, so I don't expect there is a sophisticated Java compiler out that that can do this if Hotspot can't.

    This answer also covers some reasons why such a interprocedural analysis (IPA) is likely to be both difficult and not particularly useful. In particular, methods about which non-trivial things can be proven are often small enough that they'd inlined anyways. I'm not sure if I totally agree: one could also argue that Java inlines aggressively partly because it doesn't do IPA, so strong IPA would perhaps open up the ability to do less inlining and consequently reduce runtime code footprint and JIT times.

    The other method variants you ask about in (2) or (3) don't change anything: the compiler would still need IPA do allow it to vectorize, and as far as I know Java compilers don't have it.

    (4) and (5) seem like they should be asked as totally separate questions.

    About (6) I don't think it has changed, but it would make a good question for the OpenJDK hotspot mailing lists: I think you'd get a good answer.

    Finally, it's worth noting that even in the absence of IPA and knowing nothing about method1(), a compiler could optimize the math on a, b and c if it could prove none of them had escaped. This seems pretty useless in general though: it would mean that all those variables would have been allocated in this function (or some function inlined into this one), whereas I would imagine that in most realistic scenarios at least one of the three is passed in by the caller.