I encountered a problem where g++ optimized out something it should not have. I reduced the problem to the following example:
I have a static lib with a function bool my_magic_function(int* x)
, which decrements x
by 1 if it can, otherwise (x == INT_MIN
), it returns false
and does not touch the original value.
If I use the function in a debug build, then it works as expected. But in release build the check is optimized away. Platform:
On RHEL 9.3 with g++ (GCC) 11.4.1 20230605 -> Problem present
Ubuntu 22.04 g++ 11.4.0 g++ or 10.5.0 g++ -> Problem present
Ubuntu 22.04 g++ 9.5.0 -> Code works as expected in release too.
Here is a minimal example with a static lib and a simple main.cpp using the function:
almalib.h:
bool my_magic_function(int* x);
almalib.cc:
#include "almalib.h"
#include <cstring>
#include <limits>
bool my_magic_function(int* x) {
int cp_new;
// integer overflow is undefined, so lets make sure it becomes int max
if (*x == std::numeric_limits<int>::lowest()) {
cp_new = std::numeric_limits<int>::max();
} else {
cp_new = *x - 1;
}
if (cp_new < *x) {
*x = cp_new;
return true;
}
return false;
}
main.cpp
#include "almalib.h"
#include <iostream>
#include <limits>
int main()
{
for (int x : {0, std::numeric_limits<int>::lowest()})
{
int x2 = x;
std::cerr << "Res for " << x << " " << (my_magic_function(&x2) ? "OK" : "NOT_OK") << " val: " << x2 << std::endl;
}
}
Compile:
g++ -c almalib.cc -o almalib.o
ar crf libalma.a almalib.o
g++ main.cpp -o run -L. -lalma
g++ -c almalib.cc -O3 -o almalibR.o
ar crf libalmaR.a almalibR.o
g++ main.cpp -O3 -o runR -L. -lalmaR
outout for Debug (./run):
Res for 0 OK val: -1
Res for -2147483648 NOT_OK val: -2147483648
output for Release (./runR):
Res for 0 OK val: -1
Res for -2147483648 OK val: 2147483647
going through the generated assembly with gdb, my_magic_function
is reduced to 3 lines:
0x401320 <_Z17my_magic_functionPi> subl $0x1,(%rdi)
0x401323 <_Z17my_magic_functionPi+3> mov $0x1,%eax
0x401328 <_Z17my_magic_functionPi+8> ret
My questions are:
These can be expensive, but -fwrapv
and -ftrapv
both make your problem evaporate.
-fwrapv
means that the compiler assumes signed integers act like unsigned integers and wrap around. This is what your hardware almost certainly does. -ftrapv
means it adds traps (exceptions) for when signed integers wrap around (you can probably set flags on your hardware to get this to happen, if not it will add in logic to catch it).
With either flag, your code acts correctly.
While -fwrapv
seems harmless, what it means is that a bunch of optimizations in loops and comparisons cannot be done.
Without -fwrapv
, the compiler can assume a+b
with both greater than 0 is greater than a
and greater than b
. With it, it cannot.
As a guess, your compiler is first taking the early branch code
if (*x == std::numeric_limits<int>::lowest()) {
cp_new = std::numeric_limits<int>::max();
} else {
cp_new = *x - 1;
}
and saying "on the hardware target, this is equivalent to"
cp_new = *x - 1;
because it knows the hardware target has signed underflow that wraps around. Significant optimization, eliminates a needless branch!
It then looks at
if (cp_new < *x) {
*x = cp_new;
return true;
}
then replaces cp_new:
if ((*x - 1)< *x) {
*x = (*x - 1);
return true;
}
and reasons "well, signed underflow is undefined behavior, so something minus 1 is always less than something". Thus optimizing it into:
*x = *x-1;
return true;
the error being that it used cp_new = *x - 1
in a context where underflow is defined and wraps around first, then reused it without allowing for the wrap around case.
By making underflow cause a trap or making it assumed to be true, we block the assumptions that let it do the 2nd false optimization.
But this story - why fwrapv
/ftrapv
work - is a "just so story", it is not informed by actually reading the gcc code or bug reports; it is a guess I made as to the cause of the bug, which led to the idea of messing with the overflow settings, which did fix your symptoms. Consider it a fairy tale explaining why -fwrapv
fixes your bug.