I want to perform division on an FPU in C (using integer values):
float foo;
uint32_t *ptr1, *ptr2;
foo = (float)*(ptr1) / (float)*(ptr2);
And in NASM (from object compiled via GCC) it has following representation:
mov rax, QWORD [ptr1]
mov eax, DWORD [rax]
mov eax, eax
test rax, rax
js ?_001
pxor xmm0, xmm0
cvtsi2ss xmm0, rax
jmp ?_002
?_001:
mov rdx, rax
shr rdx, 1
and eax, 01H
or rdx, rax
pxor xmm0, xmm0
cvtsi2ss xmm0, rdx
addss xmm0, xmm0
?_002:
mov rax, QWORD [ptr2]
; ... for ptr2 pattern repeats
What does this "black magic" under ?_001 mean? Isn't only cvtsi2ss enough for conversion from integer to float?
In general cvtsi2ss does the trick - converts scalar integer (other sources name it doubleword integer to single scalar but my naming is consistent with other vector ins) to scalar single (float). But it expects signed integer.
So this code
mov rdx, rax
shr rdx, 1
and eax, 01H
or rdx, rax
pxor xmm0, xmm0
cvtsi2ss xmm0, rdx
addss xmm0, xmm0
help convert unsigned to signed (please note js jump - if sign bit is set this code is executed - otherwise it is skipped). Sign is set when value is greater then 0x7FFFFFFF for uint32_t.
So the "magic" code does:
mov rdx, rax ; move value from ptr1 to edx
shr rdx, 1 ; div by 2 - logic shift not arithmetic because ptr1 is unsigned
and eax, 01H ; save least significant bit
or rdx, rax ; move this bit to divided value to someway fix rounding errors
pxor xmm0, xmm0
cvtsi2ss xmm0, rdx
addss xmm0, xmm0 ; add to itself = multiply by 2
I'm not sure what compiler and what compile options you use - GCC does simply
cvtsi2ssq xmm0, rbx
cvtsi2ssq xmm1, rax
divss xmm0, xmm1
I hope it helps.