Search code examples
assemblyx86avxjava-bytecode-asmjvm-hotspot

What is vmovdqu doing here?


I have a Java loop that looks like this:

public void testMethod() {
    int[] nums = new int[10];
    for (int i = 0; i < nums.length; i++) {
        nums[i] = 0x42;
    }
} 

The assembly I get is this:

0x00000001296ac845: cmp    %r10d,%ebp
0x00000001296ac848: jae    0x00000001296ac8b4
0x00000001296ac84a: movl   $0x42,0x10(%rbx,%rbp,4)
0x00000001296ac852: inc    %ebp               
0x00000001296ac854: cmp    %r11d,%ebp
0x00000001296ac857: jl     0x00000001296ac845  

0x00000001296ac859: mov    %r10d,%r8d
0x00000001296ac85c: add    $0xfffffffd,%r8d
0x00000001296ac860: mov    $0x80000000,%r9d
0x00000001296ac866: cmp    %r8d,%r10d
0x00000001296ac869: cmovl  %r9d,%r8d
0x00000001296ac86d: cmp    %r8d,%ebp
0x00000001296ac870: jge    0x00000001296ac88e
0x00000001296ac872: vmovq  -0xda(%rip),%xmm0                                                    
0x00000001296ac87a: vpunpcklqdq %xmm0,%xmm0,%xmm0
0x00000001296ac87e: xchg   %ax,%ax

0x00000001296ac880: vmovdqu %xmm0,0x10(%rbx,%rbp,4)  
0x00000001296ac886: add    $0x4,%ebp          
0x00000001296ac889: cmp    %r8d,%ebp
0x00000001296ac88c: jl     0x00000001296ac880  

If my understanding is correct, the first block of assembly is the one which does nums[i] = 0x42;. In the third block, there's vmovdqu which

The vmovdqu instruction moves values from an integer vector to an unaligned memory location.

However, I still don't fully understand what vmovdqu is doing in context of my loop.

What exactly is the third block of assembly code doing?

The complete code is available here: https://pastebin.com/cT5cJcMS


Solution

  • The optimizer has chosen to vectorize your loop, setting 4 values per "iteration". (The instructions preceding the vmovdqu are fairly opaque, but presumably it's splatting 0x42 into all lanes of XMM0.) The "unaligned" variant is necessary because the array is not guaranteed to be SIMD-aligned in memory (after all, it's storing int32s, not int32x4s).