Search code examples
c++optimizationg++

g++ optimizes away check for INT_MIN in release build


I encountered a problem where g++ optimized out something it should not have. I reduced the problem to the following example: I have a static lib with a function bool my_magic_function(int* x), which decrements x by 1 if it can, otherwise (x == INT_MIN), it returns false and does not touch the original value. If I use the function in a debug build, then it works as expected. But in release build the check is optimized away. Platform:

On RHEL 9.3 with g++ (GCC) 11.4.1 20230605 -> Problem present

Ubuntu 22.04 g++ 11.4.0 g++ or 10.5.0 g++ -> Problem present

Ubuntu 22.04 g++ 9.5.0 -> Code works as expected in release too.

Here is a minimal example with a static lib and a simple main.cpp using the function:

almalib.h:

bool my_magic_function(int* x); 

almalib.cc:

#include "almalib.h"
#include <cstring>
#include <limits>

bool my_magic_function(int* x) {
    int cp_new;
    // integer overflow is undefined, so lets make sure it becomes int max
    if (*x == std::numeric_limits<int>::lowest()) {
        cp_new = std::numeric_limits<int>::max(); 
    } else {
        cp_new = *x - 1;
    }  
    if (cp_new < *x) {
        *x = cp_new;
        return true;    
    }
    return false;
} 

main.cpp

#include "almalib.h"
#include <iostream>
#include <limits>

int main()
{
    for (int x : {0, std::numeric_limits<int>::lowest()})
    {
        int x2 = x;
        std::cerr << "Res for " << x << " " << (my_magic_function(&x2) ? "OK" : "NOT_OK") << " val: " << x2 << std::endl;
    }
}

Compile:

g++ -c almalib.cc -o almalib.o
ar crf libalma.a almalib.o
g++ main.cpp -o run -L. -lalma

g++ -c almalib.cc -O3 -o almalibR.o
ar crf libalmaR.a almalibR.o
g++ main.cpp -O3 -o runR -L. -lalmaR

outout for Debug (./run):

Res for 0 OK val: -1
Res for -2147483648 NOT_OK val: -2147483648

output for Release (./runR):

Res for 0 OK val: -1
Res for -2147483648 OK val: 2147483647

going through the generated assembly with gdb, my_magic_function is reduced to 3 lines:

0x401320 <_Z17my_magic_functionPi>      subl   $0x1,(%rdi)                                                                                                                                                                                          
0x401323 <_Z17my_magic_functionPi+3>    mov    $0x1,%eax                                                                                                                                                                                                     
0x401328 <_Z17my_magic_functionPi+8>    ret               

My questions are:

  • Is this a known issue?
  • What are my options to prevent it from happening? (I can trivially rewrite the example function, but not the original problem). Are there any compiler hints, or should I disable a certain optimization type?

Solution

  • These can be expensive, but -fwrapv and -ftrapv both make your problem evaporate.

    -fwrapv means that the compiler assumes signed integers act like unsigned integers and wrap around. This is what your hardware almost certainly does. -ftrapv means it adds traps (exceptions) for when signed integers wrap around (you can probably set flags on your hardware to get this to happen, if not it will add in logic to catch it).

    With either flag, your code acts correctly.

    While -fwrapv seems harmless, what it means is that a bunch of optimizations in loops and comparisons cannot be done.

    Without -fwrapv, the compiler can assume a+b with both greater than 0 is greater than a and greater than b. With it, it cannot.

    As a guess, your compiler is first taking the early branch code

    if (*x == std::numeric_limits<int>::lowest()) {
        cp_new = std::numeric_limits<int>::max(); 
    } else {
        cp_new = *x - 1;
    }
    

    and saying "on the hardware target, this is equivalent to"

    cp_new = *x - 1;
    

    because it knows the hardware target has signed underflow that wraps around. Significant optimization, eliminates a needless branch!

    It then looks at

    if (cp_new < *x) {
        *x = cp_new;
        return true;    
    }
    

    then replaces cp_new:

    if ((*x - 1)< *x) {
        *x = (*x - 1);
        return true;    
    }
    

    and reasons "well, signed underflow is undefined behavior, so something minus 1 is always less than something". Thus optimizing it into:

    *x = *x-1;
    return true;    
    

    the error being that it used cp_new = *x - 1 in a context where underflow is defined and wraps around first, then reused it without allowing for the wrap around case.

    By making underflow cause a trap or making it assumed to be true, we block the assumptions that let it do the 2nd false optimization.

    But this story - why fwrapv/ftrapv work - is a "just so story", it is not informed by actually reading the gcc code or bug reports; it is a guess I made as to the cause of the bug, which led to the idea of messing with the overflow settings, which did fix your symptoms. Consider it a fairy tale explaining why -fwrapv fixes your bug.