Context:
I want to verify the fact that under 32-bits, Ox8000 0000 - 1 = Ox7FFF FFFF
, so if both of them are interpreted as signed integers, the sign will change from negative to positive.
Here goes the first version:
#include <stdio.h>
int main() {
int x = 0x80000000;
printf("x's value is Ox%x, representing integer %d\n", x, x);
if (x - 1 > 0)
printf("Ox%x - 1 > 0\n", x);
else
printf("Ox%x - 1 = Ox%x, which reprensents %d\n", x, x-1, x-1);
return 0;
}
Run it I get:
x's value is Ox80000000, representing integer -2147483648
Ox80000000 - 1 = Ox7fffffff, which reprensents 2147483647
From the second print info x - 1 > 0
, but the statement inside if
isn't run, which means that x - 1 < 0
, which contradicts.
Then I made the second version:
#include <stdio.h>
int main() {
int x = 0x80000000;
printf("x's value is Ox%x, representing integer %d\n", x, x);
int y = x - 1;
if (y > 0)
printf("Ox%x - 1 > 0\n", x);
else
printf("Ox%x - 1 = Ox%x, which reprensents %d\n", x, x-1, x-1);
return 0;
}
This time the program run as expected:
x's value is Ox80000000, representing integer -2147483648
Ox80000000 - 1 > 0
Question:
I don't see what's the difference. From my understanding, if (x - 1 > 0)
first calculate x - 1
then compare it to 0
.
I am using gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Signed over/underflow is undefined behavior, meaning that anything can happen and we can't assume any particular outcome. The flaw in your reasoning is "as expected" - nothing is expected here.
Analyzing how we ended up in one particular behavior out of multiple possible isn't often very meaningful, but sure we can do that...
In this case, while running your code in gcc with maximum optimization, the first code results in this:
-2147483648
is pre-loaded into registers then printed with the first printf.if
never happens but is optimized out since the compiler can predict it.x - 1
where x
is known to be negative can never be > 0
.else
branch is taken and 2147483647
is pre-loaded into registers for the second printf and printed along with -2147483648
.In the second example:
The value -2147483648
is pre-loaded into registers then printed with the first printf.
int y = x - 1;
never happens, nor does the if
, all of it optimized away.
Now the compiler can't just assume "this can never be positive" but it has to consider some sort of value getting loaded into y
, because the optimizing code must behave similar to storing a value inside an int
and then comparing the result with > 0
. Storing a value is a side effect and a compiler is only allowed to optimize out side effects if it can deduct that such an optimization doesn't change the way the code behaves. (Which is kind of silly here since there is no expected behavior.)
So it takes the first branch because apparently on this particular attempt on this particular system, an underflow resulted in wrap-around behavior.
So by analyzing the code we learnt basically nothing of value except that code without bugs is good and relying on undefined behavior is bad. Since small tweaks to the code with undefined behavior could result in a completely different outcome.
Note that assigning 0x80000000
to a 32 bit int
is an unsigned to signed conversion, which is compiler specific. This is because hex literals that can't fit inside an int
are given the type unsigned int
, if they can fit there. Which is the case here.