FPU operations generated by GCC during casting integer to float

I want to perform division on an FPU in C (using integer values):

float foo;
uint32_t *ptr1, *ptr2;
foo = (float)*(ptr1) / (float)*(ptr2);

And in NASM (from object compiled via GCC) it has following representation:

    mov     rax, QWORD [ptr1]
    mov     eax, DWORD [rax]
    mov     eax, eax
    test    rax, rax
    js      ?_001
    pxor    xmm0, xmm0
    cvtsi2ss xmm0, rax
    jmp     ?_002

?_001:
    mov     rdx, rax
    shr     rdx, 1
    and     eax, 01H
    or      rdx, rax
    pxor    xmm0, xmm0
    cvtsi2ss xmm0, rdx
    addss   xmm0, xmm0
?_002:
    mov     rax, QWORD [ptr2]

; ... for ptr2 pattern repeats

What does this "black magic" under ?_001 mean? Isn't only cvtsi2ss enough for conversion from integer to float?

Solution

In general cvtsi2ss does the trick - converts scalar integer (other sources name it doubleword integer to single scalar but my naming is consistent with other vector ins) to scalar single (float). But it expects signed integer.

So this code

mov     rdx, rax                                
shr     rdx, 1                                  
and     eax, 01H                                
or      rdx, rax                                
pxor    xmm0, xmm0                              
cvtsi2ss xmm0, rdx                              
addss   xmm0, xmm0

help convert unsigned to signed (please note js jump - if sign bit is set this code is executed - otherwise it is skipped). Sign is set when value is greater then 0x7FFFFFFF for uint32_t.

So the "magic" code does:

mov     rdx, rax       ; move value from ptr1 to edx                         
shr     rdx, 1         ; div by 2 - logic shift not arithmetic because ptr1 is unsigned
and     eax, 01H       ; save least significant bit                          
or      rdx, rax       ; move this bit to divided value to someway fix rounding errors                         
pxor    xmm0, xmm0                              
cvtsi2ss xmm0, rdx                              
addss   xmm0, xmm0     ; add to itself = multiply by 2

I'm not sure what compiler and what compile options you use - GCC does simply

cvtsi2ssq       xmm0, rbx
cvtsi2ssq       xmm1, rax
divss   xmm0, xmm1

I hope it helps.