Search code examples
javarecursionjitproject-loom

Does the Java JIT ever optimize away recursive method calls?


I know Java has yet to add tail-call elimination optimization and intends to do so as a late stage addition to Project Loom. My question is instead: does the JIT ever optimize away recursive method calls in their entirety and convert them into an iterative form? It seems nominally possible but relatively difficult so I'm guessing if they did it would be strongly described in some doc, but I'm struggling to track anything on the subject down.

As a follow-up, if the JIT does eliminate recursive calls in some form beyond what's described in Loom, how does that appear on stack traces?


Solution

  • Your question “how does that appear on stack traces?” is one of the unsolved problems of tail call optimization in a JVM. There’s an expectation of manageability of the Java code, especially when code is spending a long time in a recursive algorithm (or even hanging in an endless recursion) and the developer wants to find out what’s going on.

    So in short, the JVM (current version of HotSpot) has no tail call optimization. But it is capable of inlining a finite number of recursive invocations, just like it can inline other invocations.

    When we use

    public class Recursion {
        public static void main(String[] args) {
            runs: for(int run = 0; run < 10; run++) {
                for(int i = 1; i < 1_000_000_000; i++) try {
                    int counted = recursiveCounter(i, 0);
                    if(i != counted) throw new AssertionError();
                } catch(StackOverflowError e) {
                    System.out.println("limit reached at " + i);
                    continue runs;
                }
            }
        }
    
        private static int recursiveCounter(int limit, int count) {
            return limit == 0? count: recursiveCounter(limit - 1, count + 1);
        }
    }
    

    we get values similar to those discussed in Why is the max recursion depth I can reach non-deterministic? Running with -XX:CompileCommand=print,Recursion.recursiveCounter
    -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining prints

    @ 18   Recursion::recursiveCounter (18 bytes)   inline
      @ 14   Recursion::recursiveCounter (18 bytes)   inline
        @ 14   Recursion::recursiveCounter (18 bytes)   recursive inlining too deep
    

    and

      0x0000026398871220:   mov     dword ptr [rsp+0ffffffffffff9000h],eax
      0x0000026398871227:   push    rbp
      0x0000026398871228:   sub     rsp,10h                     ;*synchronization entry
                                                                ; - Recursion::recursiveCounter@-1 (line 16)
      0x000002639887122c:   test    edx,edx
      0x000002639887122e:   je      26398871254h                ;*ifne {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@1 (line 16)
      0x0000026398871230:   cmp     edx,1h
      0x0000026398871233:   je      26398871259h                ;*ifne {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@1 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
      0x0000026398871235:   add     r8d,2h                      ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@13 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
      0x0000026398871239:   add     edx,0fffffffeh              ;*isub {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@10 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
      0x000002639887123c:   nop
      0x000002639887123f:   call    26398871220h                ; ImmutableOopMap {}
                                                                ;*invokestatic recursiveCounter {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ;   {static_call}
      0x0000026398871244:   add     rsp,10h
      0x0000026398871248:   pop     rbp
      0x0000026398871249:   mov     r10,qword ptr [r15+110h]
      0x0000026398871250:   test    dword ptr [r10],eax         ;   {poll_return}
      0x0000026398871253:   ret
      0x0000026398871254:   mov     eax,r8d
      0x0000026398871257:   jmp     26398871244h
      0x0000026398871259:   mov     eax,r8d
      0x000002639887125c:   inc     eax                         ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@13 (line 16)
      0x000002639887125e:   jmp     26398871244h                ;*invokestatic recursiveCounter {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
      0x0000026398871260:   mov     rdx,rax
      0x0000026398871263:   add     rsp,10h
      0x0000026398871267:   pop     rbp
      0x0000026398871268:   jmp     26390ea4d00h                ;*ireturn {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@17 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ;   {runtime_call _rethrow_Java}
    

    The interesting part is the test edx,edx instruction matching our limit == 0 condition. If this condition is not fulfilled, there’s a cmp edx,1h test following, effectively testing what the next recursive invocation would test and if this condition also isn’t fulfilled, the code will perform add r8d,2h and add edx,0fffffffeh, incrementing the count by two and decrementing the limit by two.

    So we clearly see the effect of inlining one level of recursion and fusing the operations. The inlining diagnostics tells us that a limit has been crossed, so it’s worth to investigate what happens when specifying, e.g. -XX:MaxRecursiveInlineLevel=4:

    @ 18   Recursion::recursiveCounter (18 bytes)   inline
      @ 14   Recursion::recursiveCounter (18 bytes)   inline
        @ 14   Recursion::recursiveCounter (18 bytes)   inline
          @ 14   Recursion::recursiveCounter (18 bytes)   inline
            @ 14   Recursion::recursiveCounter (18 bytes)   inline
              @ 14   Recursion::recursiveCounter (18 bytes)   recursive inlining too deep
    
      0x0000025345d31620:   mov     dword ptr [rsp+0ffffffffffff9000h],eax
      0x0000025345d31627:   push    rbp
      0x0000025345d31628:   sub     rsp,10h                     ;*synchronization entry
                                                                ; - Recursion::recursiveCounter@-1 (line 16)
      0x0000025345d3162c:   test    edx,edx
      0x0000025345d3162e:   je      25345d31660h                ;*ifne {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@1 (line 16)
      0x0000025345d31630:   cmp     edx,1h
      0x0000025345d31633:   je      25345d31665h                ;*ifne {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@1 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
      0x0000025345d31635:   cmp     edx,2h
      0x0000025345d31638:   je      25345d3166ch                ;*ifne {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@1 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
      0x0000025345d3163a:   cmp     edx,3h
      0x0000025345d3163d:   je      25345d31674h                ;*ifne {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@1 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
      0x0000025345d3163f:   cmp     edx,4h
      0x0000025345d31642:   je      25345d3167ch                ;*ifne {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@1 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
      0x0000025345d31644:   add     r8d,5h                      ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@13 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
      0x0000025345d31648:   add     edx,0fffffffbh
      0x0000025345d3164b:   call    25345d31620h                ; ImmutableOopMap {}
                                                                ;*invokestatic recursiveCounter {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ;   {static_call}
      0x0000025345d31650:   add     rsp,10h
      0x0000025345d31654:   pop     rbp
      0x0000025345d31655:   mov     r10,qword ptr [r15+110h]
      0x0000025345d3165c:   test    dword ptr [r10],eax         ;   {poll_return}
      0x0000025345d3165f:   ret
      0x0000025345d31660:   mov     eax,r8d
      0x0000025345d31663:   jmp     25345d31650h
      0x0000025345d31665:   mov     eax,r8d
      0x0000025345d31668:   inc     eax                         ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@13 (line 16)
      0x0000025345d3166a:   jmp     25345d31650h
      0x0000025345d3166c:   mov     eax,r8d
      0x0000025345d3166f:   add     eax,2h                      ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@13 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
      0x0000025345d31672:   jmp     25345d31650h
      0x0000025345d31674:   mov     eax,r8d
      0x0000025345d31677:   add     eax,3h                      ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@13 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
      0x0000025345d3167a:   jmp     25345d31650h
      0x0000025345d3167c:   mov     eax,r8d
      0x0000025345d3167f:   add     eax,4h                      ;*iadd {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@13 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
      0x0000025345d31682:   jmp     25345d31650h                ;*invokestatic recursiveCounter {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
      0x0000025345d31684:   mov     rdx,rax
      0x0000025345d31687:   add     rsp,10h
      0x0000025345d3168b:   pop     rbp
      0x0000025345d3168c:   jmp     2533e364d00h                ;*ireturn {reexecute=0 rethrow=0 return_oop=0}
                                                                ; - Recursion::recursiveCounter@17 (line 16)
                                                                ; - Recursion::recursiveCounter@14 (line 16)
                                                                ;   {runtime_call _rethrow_Java}
    

    We can see that now, the compiled code has learned to add up to five at once. The application will also print a much larger limit of nested invocations. The comments behind the instruction show that the meta information about the nested invocations is still present, even if there are no individual instructions associated with an invocation level. This implies that a stack trace representing the whole original call chain could still be constructed.

    Of course, this is not the same as actual tail call optimization. Regardless of how high we set the limit, it’s still inlining a finite number of invocation and it’s already recognizable that setting the limit too high would yield unreasonable code size.

    So the answer to the literal question “Does the Java JIT ever optimize away recursive method calls?” is “Yes, it does”. But not the entire recursion but a finite number of calls, in other words, not in the form of tail call optimization.