Search code examples
cgccassemblyx86-64fpu

FPU operations generated by GCC during casting integer to float


I want to perform division on an FPU in C (using integer values):

float foo;
uint32_t *ptr1, *ptr2;
foo = (float)*(ptr1) / (float)*(ptr2);

And in NASM (from object compiled via GCC) it has following representation:

    mov     rax, QWORD [ptr1]
    mov     eax, DWORD [rax]
    mov     eax, eax
    test    rax, rax
    js      ?_001
    pxor    xmm0, xmm0
    cvtsi2ss xmm0, rax
    jmp     ?_002

?_001:
    mov     rdx, rax
    shr     rdx, 1
    and     eax, 01H
    or      rdx, rax
    pxor    xmm0, xmm0
    cvtsi2ss xmm0, rdx
    addss   xmm0, xmm0
?_002:
    mov     rax, QWORD [ptr2]

; ... for ptr2 pattern repeats

What does this "black magic" under ?_001 mean? Isn't only cvtsi2ss enough for conversion from integer to float?


Solution

  • In general cvtsi2ss does the trick - converts scalar integer (other sources name it doubleword integer to single scalar but my naming is consistent with other vector ins) to scalar single (float). But it expects signed integer.

    So this code

    mov     rdx, rax                                
    shr     rdx, 1                                  
    and     eax, 01H                                
    or      rdx, rax                                
    pxor    xmm0, xmm0                              
    cvtsi2ss xmm0, rdx                              
    addss   xmm0, xmm0  
    

    help convert unsigned to signed (please note js jump - if sign bit is set this code is executed - otherwise it is skipped). Sign is set when value is greater then 0x7FFFFFFF for uint32_t.

    So the "magic" code does:

    mov     rdx, rax       ; move value from ptr1 to edx                         
    shr     rdx, 1         ; div by 2 - logic shift not arithmetic because ptr1 is unsigned
    and     eax, 01H       ; save least significant bit                          
    or      rdx, rax       ; move this bit to divided value to someway fix rounding errors                         
    pxor    xmm0, xmm0                              
    cvtsi2ss xmm0, rdx                              
    addss   xmm0, xmm0     ; add to itself = multiply by 2
    

    I'm not sure what compiler and what compile options you use - GCC does simply

    cvtsi2ssq       xmm0, rbx
    cvtsi2ssq       xmm1, rax
    divss   xmm0, xmm1
    

    I hope it helps.