I continue to explore a JIT assemble output and I found a pair of strange load/store instructions:
mov 0x30(%rsp),%rdx ; <---- this load
test %edi,%edi
jne 0x00007fd3d27c5032
cmp %r11d,%r10d
jae 0x00007fd3d27c4fbc
mov 0x10(%rbx,%r10,4),%edi
test %edi,%edi
je 0x00007fd3d27c5062
mov 0xc(%rbp),%esi
test %esi,%esi
je 0x00007fd3d27c4fea
mov %r8d,0x1c(%rsp)
mov %rdx,0x30(%rsp) ; <---- this store
mov %rax,0x28(%rsp)
mov %ecx,0x10(%rsp)
mov %rbp,0x20(%rsp)
mov %rbx,0x8(%rsp)
mov %r13d,%ebp
mov %r10d,0x14(%rsp)
mov %r11d,0x18(%rsp)
mov %r14d,0x40(%rsp)
mov %r9,(%rsp)
lea (%r12,%rdi,8),%rdx
shl $0x3,%rsi
callq 0x00007fd3caceaf00
mov 0x20(%rsp),%r11
mov 0x10(%r11),%r10d
mov 0x8(%r12,%r10,8),%r8d
cmp $0xf2c10,%r8d
jne 0x00007fd3d27c4ffa
lea (%r12,%r10,8),%r8
mov 0x10(%r8),%r10
movabs $0x7fffffffffffffff,%r9
cmp %r9,%r10
je 0x00007fd3d27c5092
mov %r10,%rdx
add $0x1,%rdx
test %rdx,%rdx
jle 0x00007fd3d27c50ce
mov %r10,%rax
lock cmpxchg %rdx,0x10(%r8)
sete %r11b
movzbl %r11b,%r11d
test %r11d,%r11d
je 0x00007fd3d27c5116
test %r10,%r10
jle 0x00007fd3d27c4f48
mov 0x108(%r15),%r11
mov 0x14(%rsp),%r10d
inc %r10d
mov 0x1c(%rsp),%r8d
inc %r8d
test %eax,(%r11)
mov (%rsp),%r9
mov 0x40(%rsp),%r14d
mov 0x18(%rsp),%r11d
mov %ebp,%r13d
mov 0x8(%rsp),%rbx
mov 0x20(%rsp),%rbp
mov 0x10(%rsp),%ecx
mov 0x28(%rsp),%rax
movzbl 0x18(%r9),%edi
movslq %r8d,%rsi
cmp 0x30(%rsp),%rsi
jge 0x00007fd3d27c4f17
cmp %r11d,%r10d
jl 0x00007fd3d27c4dea ; this is the end of the loop
; jump to the first instruction in this listing
Why are these instructions needed? There is no work with %rdx
between the load/store. Yes, this is a loop, but I don't see why it might be useful on the next iterations neither.
Is it a bug or is it the same sort of JVM tricks as in my previous question?
I've found the same problem in this article but there is no explanation there.
The full PrintAssemble you might see here and the original code is here
Thanks!
I've reproduced the full assembly code for ArraySubscription.slowPath
. Though the register mapping is slightly different comparing to your snippet, the code structure is exactly the same.
The incomplete fragment led you to a wrong conclusion. Actually %rdx
can change between load and store, because there is a branch target in the middle: L219 -> L55
This becomes quite understandable when looking at the corresponding Java source code:
while (true) {
for (; sent < n && idx < length; sent++, idx++) {
if (canceled) {
return;
}
T element = array[idx];
if (element == null) {
subscriber.onError(new NullPointerException());
return;
}
subscriber.onNext(element);
}
Perfasm showed you the compiled code for the hot inner for
loop. The value at 0x30(%rsp)
, which is also cached in %rdx
, holds the local variable n
. But then, after the loop, the value of n
changes:
n = requested;
and the outer while
continues. The corresponding compiled code updates n
only in a register, not in 0x30(%rsp)
.