I have a Java loop that looks like this:
public void testMethod() {
int[] nums = new int[10];
for (int i = 0; i < nums.length; i++) {
nums[i] = 0x42;
}
}
The assembly I get is this:
0x00000001296ac845: cmp %r10d,%ebp
0x00000001296ac848: jae 0x00000001296ac8b4
0x00000001296ac84a: movl $0x42,0x10(%rbx,%rbp,4)
0x00000001296ac852: inc %ebp
0x00000001296ac854: cmp %r11d,%ebp
0x00000001296ac857: jl 0x00000001296ac845
0x00000001296ac859: mov %r10d,%r8d
0x00000001296ac85c: add $0xfffffffd,%r8d
0x00000001296ac860: mov $0x80000000,%r9d
0x00000001296ac866: cmp %r8d,%r10d
0x00000001296ac869: cmovl %r9d,%r8d
0x00000001296ac86d: cmp %r8d,%ebp
0x00000001296ac870: jge 0x00000001296ac88e
0x00000001296ac872: vmovq -0xda(%rip),%xmm0
0x00000001296ac87a: vpunpcklqdq %xmm0,%xmm0,%xmm0
0x00000001296ac87e: xchg %ax,%ax
0x00000001296ac880: vmovdqu %xmm0,0x10(%rbx,%rbp,4)
0x00000001296ac886: add $0x4,%ebp
0x00000001296ac889: cmp %r8d,%ebp
0x00000001296ac88c: jl 0x00000001296ac880
If my understanding is correct, the first block of assembly is the one which does nums[i] = 0x42;
. In the third block, there's vmovdqu
which
The vmovdqu instruction moves values from an integer vector to an unaligned memory location.
However, I still don't fully understand what vmovdqu
is doing in context of my loop.
What exactly is the third block of assembly code doing?
The complete code is available here: https://pastebin.com/cT5cJcMS
The optimizer has chosen to vectorize your loop, setting 4 values per "iteration". (The instructions preceding the vmovdqu
are fairly opaque, but presumably it's splatting 0x42
into all lanes of XMM0
.) The "unaligned" variant is necessary because the array is not guaranteed to be SIMD-aligned in memory (after all, it's storing int32
s, not int32x4
s).