Search code examples
ccompiler-optimizationc99signed

operation reordering and signedness in C


I found several questions discussing compiler optimizations for signed vs unsigned. The general conclusion is that signed allows for more optimizations due to the undefined behavior of overflow. However, I didn't find any discussion on how signedness plays with operation reordering.

Consider the expression

a - b + c

If all values are unsigned, the compiler can always reorder the operations (adding a and c first may improve performance).

If all values are signed, the compiler must prove that reordering will not cause an overflow to occur. In general a + c could overflow so compiler is constrained in what it can reorder.

Am I correct that the compiler has more freedom to reorder operations for unsigned values?


EDIT

I was going to ask about other operations like * and / but I guess this question only makes sense for + and -. In general no reordering is possible for muliply and divide no matter whether the integers are signed or unsigned.


Solution

  • If all values are signed, the compiler must prove that reordering will not cause an overflow to occur.

    This premise is false. If there is no undefined behavior in the C program, the compiler’s burden is to produce that behavior. The C standard does not constrain how it does it. If it reorders a - b + c to a + c - b but it does so using instructions that behave appropriately, then a + c - b will calculate the same result as a - b + c.

    Consider an example using 32-bit int objects: a is 7FFFFFF016, b is 10016, and c is 2016. Then a - b would be 7FFFFEF016, and a - b + c would be 7FFFFF1016. If the compiler computes a + c using an add instruction that produces 8000001016, and then computes a + c - b using a subtract instruction that produces 7FFFFF1016, then the result is correct.

    This works because the add and subtract instructions do not “care” about overflow and produces the desired results even if they go outside of int bounds. Not only are such instructions common on today’s processors, the same add and subtract instructions are used for both signed and unsigned arithmetic, because the bit patterns for addition and subtraction are the same for unsigned and two’s complement representations. (Comparison instructions differ, because the bits have different meanings when interpreted as unsigned or two’s complement.)

    The fact that if the source code had been literally changed to a + c - b, the program would have had undefined behavior is irrelevant. A compiler typically does not operate by notionally rearranging the C source code to other C source code that is then compiled. Compilers typically operate by converting the C semantics to some internal representation and then generating assembly code to implement those semantics.