Problem: movaps
is giving me a segmentation fault.
Context: The x86-64 instruction vmovaps
is designed to be used with the AVX registers on a Core i series processor (which I am running this system with). The AVX registers are twice as wide as the SSE ones (256 vs 128 bits respectively). Instruction vmovaps
should move a vector of aligned floating-point values (32-bits) into the specified ymm
register.
Likely Cause: The alignment of the source data is of particular importance, as incorrectly aligned data is a source for segmentation faults. However, even when I have aligned my data, I am encountering a segmentation fault myself.
segment .data
align 16
xs:
dd 0.0
dd 1.1
dd 2.2
dd 3.3
dd 4.4
dd 5.5
dd 6.6
dd 7.7
align 16
ys:
dd 8.8
dd 7.7
dd 6.6
dd 5.5
dd 4.4
dd 3.3
dd 2.2
dd 1.1
segment .text
global main
main:
push rbp
mov rbp, rsp
; Move eight 32-bit floats from "xs" into ymm0
vmovaps ymm0, [xs]
; Move eight 32-bit floats from "ys" into ymm1
vmovaps ymm1, [ys]
; Add all eight to each other simulatenously, put in ymm0
vaddps ymm0, ymm1
xor rax, rax
leave
ret
Compiled with: yasm -f elf64 -g dwarf2 <filename>
Linked with: gcc -o <bin-name> <filename>.o
When I run this with GDB, it simply reports it received a segmentation fault signal on the first vmovaps
instruction. I have checked documentation on alignment and I think it is all correct. For what its worth, I am running and executing this on a i5 8600K.
I've also looked at this similar question. However I can't really apply the answer to his problem to mine (something to do with his inline assembly). If anyone could weight in on this I'd be grateful!
vmovaps
with ymm0
operand requires 32 byte alignment. To quote the manual:
When the source or destination operand is a memory operand, the operand must be aligned on a 16-byte (128-bit version), 32-byte (VEX.256 encoded version) or 64-byte (EVEX.512 encoded version) boundary or a general-protection exception (#GP) will be generated. For EVEX.512 encoded versions, the operand must be aligned to the size of the memory operand.
(emphasis added). Linux delivers SIGSEGV to processes that cause a #GP exception.
Thus, you should change align 16
to align 32
for your static array of dd
elements
Or use vmovups
unaligned loads, and let the hardware handle it; same speed on data that happens to be aligned, and on most CPUs also for loads/stores that don't split across a cache-line boundary.
Related: How to solve the 32-byte-alignment issue for AVX load/store operations? for C and C++ ways of aligning things, including arrays in automatic (stack) or dynamic storage.