Search code examples
c++clow-latency

Compiler optimization on marking an int unsigned?


For an integer that is never expected to take -ve values, one could unsigned int or int. From a compiler perspective or purely cpu cycle perspective is there any difference on x86_64 ?


Solution

  • It depends. It might go either way, depending on what you are doing with that int as well as on the properties of the underlying hardware.


    An obvious example in unsigned ints favor would be the integer division operation. In C/C++ integer division is supposed to round towards zero, while machine integer division on x86 rounds towards negative infinity. Also, various "optimized" replacements for integer division (shifts, etc.) also generally round towards negative infinity. So, in order to satisfy standard requirements the compiler are forced to adjust the signed integer division results with additional machine instructions. In case of unsigned integer division this problem does not arise, which is why generally integer division works much faster for unsigned types than for signed types.

    For example, consider this simple expression

    rand() / 2
    

    The code generated for this expression by MSVC complier will generally look as follows

    call        rand
    cdq              
    sub         eax,edx 
    sar         eax,1 
    

    Note that instead of a single shift instruction (sar) we are seeing a whole bunch of instructions here, i.e our sar is preceded by two extra instructions (cdq and sub). These extra instructions are there just to "adjust" the division in order to force it to generate the "correct" (from C language point of view) result. Note, that the compiler does not know that your value will always be positive, so it has to generate these instructions always, unconditionally. They will never do anything useful, thus wasting the CPU cycles.

    Not take a look at the code for

    (unsigned) rand() / 2
    

    It is just

    call        rand  
    shr         eax,1 
    

    In this case a single shift did the trick, thus providing us with an astronomically faster code (for the division alone).


    On the other hand, when you are mixing integer arithmetics and FPU floating-point arithmetics, signed integer types might work faster since the FPU instruction set contains immediate instruction for loading/storing signed integer values, but has no instructions for unsigned integer values.

    To illustrate this one can use the following simple function

    double zero() { return rand(); }
    

    The generated code will generally be very simple

    call        rand 
    mov         dword ptr [esp],eax 
    fild        dword ptr [esp]
    

    But if we change our function to

    double zero() { return (unsigned) rand(); }
    

    the generated code will change to

    call        rand
    test        eax,eax 
    mov         dword ptr [esp],eax 
    fild        dword ptr [esp] 
    jge         zero+17h 
    fadd        qword ptr [__real@41f0000000000000 (4020F8h)] 
    

    This code is noticeably larger because the FPU instruction set does not work with unsigned integer types, so the extra adjustments are necessary after loading an unsigned value (which is what that conditional fadd does).


    There are other contexts and examples that can be used to demonstrate that it works either way. So, again, it all depends. But generally, all this will not matter in the big picture of your program's performance. I generally prefer to use unsigned types to represent unsigned quantities. In my code 99% of integer types are unsigned. But I do it for purely conceptual reasons, not for any performance gains.