Search code examples
coptimizationcastingfloating-pointx87

Is casting float to int inside processor in C


I'm interested in how the compiler does to cast a float into an int by instructions like :

    float x_f = 3.1415
    int x = (int)x_f;

Especially talking about speed. Is it super-fast like build-in processor instruction? Or does it need computing?

I also wander if it changes something if the float always contains an exact integer (ex: x_f = 3.0000).

EDIT: This question is for gcc compilers used on intel x86 processors.

EDIT2: Does it change something if x_f = 3.0 ?


Solution

  • It depends a lot on the particular cpu. Since you're interested in x86, the original 387 fpu has an instruction to convert float to integer, but it can't be used directly because it uses the default rounding mode, whereas conversions in C are required to truncate, not round. Thus, the following function:

    int f(float x)
    {
        return x;
    }
    

    compiles to (with gcc -O3 -fno-asynchronous-unwind-tables, to avoid crud in the asm):

            .text
            .p2align 4,,15
            .globl  f
            .type   f, @function
    f:
            subl    $8, %esp
            fnstcw  6(%esp)
            movw    6(%esp), %ax
            movb    $12, %ah
            movw    %ax, 4(%esp)
            flds    12(%esp)
            fldcw   4(%esp)
            fistpl  (%esp)
            fldcw   6(%esp)
            movl    (%esp), %eax
            addl    $8, %esp
            ret
    

    What it's doing it saving, changing, and restoring the fpu control word to change the rounding mode.

    On the other hand, if you're building for a target that has SSE available for floating point, you get:

            .text
            .globl  f
            .type   f, @function
    f:
            cvttss2si       4(%esp), %eax
            ret
    

    So, it really depends.

    Finally, since you mentioned you're particularly interested in the case where the value is already a whole number, this does not make any difference. The cpu operations to convert almost surely don't care. However, in this case you can cheat: since you know the input is a whole number, rounding and truncation produce the same result, and you can use lrintf rather than casting or implicitly converting to float. This should be a major improvement on x86 targets not using sse for math, especially if the compiler recognizes lrintf and inlines it. Here is the same function, using lrintf(x) instead of x, with the -fno-math-errno option added (otherwise gcc assumes libm might want to set errno and thus doesn't replace the call):

    f:
            pushl   %eax
            flds    8(%esp)
            fistpl  (%esp)
            movl    (%esp), %eax
            popl    %edx
            ret
    

    Note that gcc did a bad job of compiling this function; it could have generated:

    f:
            flds    4(%esp)
            fistpl  4(%esp)
            movl    4(%esp), %eax
            ret
    

    This is valid because the argument space on the stack belongs to the callee and may be clobbered at will. And even if it weren't, movl (%esp),%eax ; popl %edx when you don't care what ends up in edx is an idiotic way of writing popl %eax...