Search code examples
assemblyx86-64nasmmemory-alignmentavx

YASM: vmovaps instruction causing segmentation fault


Problem: movaps is giving me a segmentation fault.

Context: The x86-64 instruction vmovaps is designed to be used with the AVX registers on a Core i series processor (which I am running this system with). The AVX registers are twice as wide as the SSE ones (256 vs 128 bits respectively). Instruction vmovaps should move a vector of aligned floating-point values (32-bits) into the specified ymm register.

Likely Cause: The alignment of the source data is of particular importance, as incorrectly aligned data is a source for segmentation faults. However, even when I have aligned my data, I am encountering a segmentation fault myself.

Example

    segment .data

align 16
xs:
    dd  0.0
    dd  1.1
    dd  2.2
    dd  3.3
    dd  4.4
    dd  5.5
    dd  6.6
    dd  7.7

align 16
ys:
    dd  8.8
    dd  7.7
    dd  6.6
    dd  5.5
    dd  4.4
    dd  3.3
    dd  2.2
    dd  1.1

    segment .text
    global main

main:
    push rbp
    mov rbp, rsp

    ; Move eight 32-bit floats from "xs" into ymm0
    vmovaps ymm0, [xs]

    ; Move eight 32-bit floats from "ys" into ymm1
    vmovaps ymm1, [ys]

    ; Add all eight to each other simulatenously, put in ymm0
    vaddps ymm0, ymm1

    xor rax, rax
    leave
    ret

Compiled with: yasm -f elf64 -g dwarf2 <filename>

Linked with: gcc -o <bin-name> <filename>.o

When I run this with GDB, it simply reports it received a segmentation fault signal on the first vmovaps instruction. I have checked documentation on alignment and I think it is all correct. For what its worth, I am running and executing this on a i5 8600K.

I've also looked at this similar question. However I can't really apply the answer to his problem to mine (something to do with his inline assembly). If anyone could weight in on this I'd be grateful!


Solution

  • vmovaps with ymm0 operand requires 32 byte alignment. To quote the manual:

    When the source or destination operand is a memory operand, the operand must be aligned on a 16-byte (128-bit version), 32-byte (VEX.256 encoded version) or 64-byte (EVEX.512 encoded version) boundary or a general-protection exception (#GP) will be generated. For EVEX.512 encoded versions, the operand must be aligned to the size of the memory operand.

    (emphasis added). Linux delivers SIGSEGV to processes that cause a #GP exception.

    Thus, you should change align 16 to align 32 for your static array of dd elements

    Or use vmovups unaligned loads, and let the hardware handle it; same speed on data that happens to be aligned, and on most CPUs also for loads/stores that don't split across a cache-line boundary.

    Related: How to solve the 32-byte-alignment issue for AVX load/store operations? for C and C++ ways of aligning things, including arrays in automatic (stack) or dynamic storage.