Search code examples
c

Confused by difference between expression inside if and expression outside if


Context:

I want to verify the fact that under 32-bits, Ox8000 0000 - 1 = Ox7FFF FFFF, so if both of them are interpreted as signed integers, the sign will change from negative to positive.

Here goes the first version:

#include <stdio.h>

int main() {
    int x = 0x80000000;
    printf("x's value is Ox%x, representing integer %d\n", x, x);
    if (x - 1 > 0)
        printf("Ox%x - 1 > 0\n", x);
    else
        printf("Ox%x - 1 = Ox%x, which reprensents %d\n", x, x-1, x-1);
    return 0;
}

Run it I get:

x's value is Ox80000000, representing integer -2147483648
Ox80000000 - 1 = Ox7fffffff, which reprensents 2147483647

From the second print info x - 1 > 0, but the statement inside if isn't run, which means that x - 1 < 0, which contradicts.

Then I made the second version:

#include <stdio.h>

int main() {
    int x = 0x80000000;
    printf("x's value is Ox%x, representing integer %d\n", x, x);
    int y =  x - 1;
    if (y > 0)
        printf("Ox%x - 1 > 0\n", x);
    else
        printf("Ox%x - 1 = Ox%x, which reprensents %d\n", x, x-1, x-1);
    return 0;
}

This time the program run as expected:

x's value is Ox80000000, representing integer -2147483648
Ox80000000 - 1 > 0

Question:

I don't see what's the difference. From my understanding, if (x - 1 > 0) first calculate x - 1 then compare it to 0.

I am using gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0


Solution

  • Signed over/underflow is undefined behavior, meaning that anything can happen and we can't assume any particular outcome. The flaw in your reasoning is "as expected" - nothing is expected here.

    Analyzing how we ended up in one particular behavior out of multiple possible isn't often very meaningful, but sure we can do that...

    In this case, while running your code in gcc with maximum optimization, the first code results in this:

    • The value -2147483648 is pre-loaded into registers then printed with the first printf.
    • The if never happens but is optimized out since the compiler can predict it.
    • Since a negative signed number in C can never become positive by subtracting from it (since that would invoke undefined behavior), the compiler is free to assume that any expression x - 1 where x is known to be negative can never be > 0.
    • Therefore the else branch is taken and 2147483647 is pre-loaded into registers for the second printf and printed along with -2147483648.

    In the second example:

    • The value -2147483648 is pre-loaded into registers then printed with the first printf.

    • int y = x - 1; never happens, nor does the if, all of it optimized away.

    • Now the compiler can't just assume "this can never be positive" but it has to consider some sort of value getting loaded into y, because the optimizing code must behave similar to storing a value inside an int and then comparing the result with > 0. Storing a value is a side effect and a compiler is only allowed to optimize out side effects if it can deduct that such an optimization doesn't change the way the code behaves. (Which is kind of silly here since there is no expected behavior.)

    • So it takes the first branch because apparently on this particular attempt on this particular system, an underflow resulted in wrap-around behavior.

    So by analyzing the code we learnt basically nothing of value except that code without bugs is good and relying on undefined behavior is bad. Since small tweaks to the code with undefined behavior could result in a completely different outcome.


    Note that assigning 0x80000000 to a 32 bit int is an unsigned to signed conversion, which is compiler specific. This is because hex literals that can't fit inside an int are given the type unsigned int, if they can fit there. Which is the case here.