c++gcc undefined-behavior integer-overflow

Will gcc skip this check for signed integer overflow?

For example, given the following code:

int f(int n)
{
    if (n < 0)
        return 0;
    n = n + 100;
    if (n < 0)
        return 0;
    return n;
}

Assuming you pass in a number that is very close to integer overflow (less than 100 away), will the compiler produce code that would give you a negative return?

Here is an excerpt about this issue from "The Descent to C" by Simon Tatham:

"The GNU C compiler (gcc) generates code for this function which can return a negative integer, if you pass in (for example) the maximum represent able ‘int’ value. Because the compiler knows after the first if statement that n is positive, and then it assumes that integer overflow does not occur and uses that assumption to conclude that the value of n after the addition must still be positive, so it completely removes the second if statement and returns the result of the addition unchecked."

It got me wondering if the same issue existed in C++ compilers, and if I should be careful that my integer overflow checks aren't skipped.

Solution

Short Answer

Will a compiler definitely optimize away the check in your example, we can't say for all cases but we can do a test against gcc 4.9 using the godbolt interactive compiler with the following code (see it live):

int f(int n)
{
    if (n < 0) return 0;

    n = n + 100;

    if (n < 0) return 0;

    return n;
}

int f2(int n)
{
    if (n < 0) return 0;

    n = n + 100;

    return n;
}

and we see that it generates identical code for both versions, which means it is indeed eliding the second check:

f(int):  
    leal    100(%rdi), %eax #, tmp88 
    testl   %edi, %edi  # n
    movl    $0, %edx    #, tmp89
    cmovs   %edx, %eax  # tmp88,, tmp89, D.2246
    ret
f2(int):
    leal    100(%rdi), %eax #, tmp88
    testl   %edi, %edi  # n
    movl    $0, %edx    #, tmp89 
    cmovs   %edx, %eax  # tmp88,, tmp89, D.2249
    ret

Long Answer

When your code exhibits undefined behavior or relies on potential undefined behavior(in this example signed integer overflow) then yes, the compiler can make assumptions and optimize around them. For example it can assume there is no undefined behavior and thus optimize according to that assumption. The most infamous example is probably the removal of a null check in the Linux kernel. The code was as follows:

struct foo *s = ...;
int x = s->f;
if (!s) return ERROR;
... use s ..

The logic used was that since s was dereferenced it must not be a null pointer otherwise that would be undefined behavior and therefore it optimized away the if (!s) check. The linked article says:

The problem is that the dereference of s in line 2 permits a compiler to infer that s is not null (if the pointer is null then the function is undefined; the compiler can simply ignore this case). Thus, the null check in line 3 gets silently optimized away and now the kernel contains an exploitable bug if an attacker can find a way to invoke this code with a null pointer.

This applies equally to both C and C++ which both have similar language around undefined behavior. In both cases the standard tells us that the results of undefined behavior are unpredictable, although what is specifically undefined in either language may differ. The draft C++ standard defines undefined behavior as follows:

behavior for which this International Standard imposes no requirements

and includes the following note (emphasis mine):

Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed.

The draft C11 standard has similar language.

Proper signed overflow check

Your check is not the proper way to guard against signed integer overflow, you need to check before you execute the operation and not execute the operation if it would cause overflow. Cert has a good reference on how to prevent signed integer overflow for the various operations. For the addition case it recommends the following:

#include <limits.h>

void f(signed int si_a, signed int si_b) {
  signed int sum;
  if (((si_b > 0) && (si_a > (INT_MAX - si_b))) ||
      ((si_b < 0) && (si_a < (INT_MIN - si_b)))) {
    /* Handle error */
  } else {
    sum = si_a + si_b;
  }

If we plug this code into godbolt we can see that the checks are elided which is he behavior we would expect.