I'm just getting started with nasm and I'm having some issues with the __vectorcall
convention. When calling my test function (sinf
), I get an access violation returning from the callee.
; float sin_f(float)
global sin_f@@4
section .text
sin_f@@4:
push rdi
; sub rsp, 16 ; make room for xmm
movss [rsp - 16], xmm0 ; mov float arg onto stack
fld qword [rsp - 16] ; push argument on float stack
fsin ; do sin in radians
fstp qword [rsp - 16] ; pop float stack
movss xmm0, [rsp - 16] ; move back to xmm0
movq rax, xmm0
; add rsp, 16 ; reset stack
pop rdi
ret
I'm obviously not doing cleanup correctly, but so far all my tries have failed. Looking at some MSVC dissasembly I've seen they push/pop rdi
, so I've added that. Instead of sub/add
to rsp
(which was causing crashes inside no-mans-land), I just substract directly rsp
.
This article covers popular calling conventions, and mentions __vectorcall
being similar to __fastcall
. However, using ret 4
doesn't change anything. Plus MSVC doesn't do that itself. Oh and I'm also moving to rax
just because.
Any help with these concepts would be greatly appreciated. Thanks!
edit: The error is
Exception thrown at 0x00007FF6198B2C5A in demo1.exe:
0xC0000005: Access violation reading location 0x00000000B817FA20
Caller disassembly:
; 13 : T sin(T angle) {
$LN3:
movss DWORD PTR [rsp+8], xmm0
push rdi
sub rsp, 48 ; 00000030H
mov rdi, rsp
mov ecx, 12
mov eax, -858993460 ; ccccccccH
rep stosd
; 14 : static_assert(std::is_floating_point_v<T>, "requires floating point");
; 15 : if constexpr (std::is_same_v<float, T>) {
; 16 : return detail::sin_f(angle);
movss xmm0, DWORD PTR angle$[rsp]
call sin_f@@8
; 17 : } else {
; 18 : return detail::sin_d(angle);
; 19 : }
; 20 :
; 26 : }
add rsp, 48 ; 00000030H
pop rdi
ret 0
So the main problem was using @@4
for the byte size. For some reason it needs to be @@8
. Maybe because of the return value?
Also, I messed up the 64/32 bit calls. This is the final working version:
; float sin_f(float)
global sin_f@@8
section .text
sin_f@@8:
sub rsp, 24 ; red-zone
movss [rsp], xmm0 ; mov float arg onto stack
fld dword [rsp] ; push argument on float stack
fsin ; do sin in radians
fstp dword [rsp] ; pop float stack
movss xmm0, [rsp] ; move back to xmm0
add rsp, 24 ; red-zone
ret