This is a complete rewrite of the question. Hopefully it is clearer now.
I want to implement in C a function that performs addition of signed int
s with wrapping in case of overflow.
I want to target mainly the x86-64 architecture, but of course the more portable the implementation is the better. I'm also concerned mostly about producing decent assembly code through gcc, clang, icc, and whatever is used on Windows.
The goal is twofold:
By decent machine code I mean a single leal
or a single addl
instruction on machines which natively support the operation.
I'm able to satisfy either of the two requisites, but not both.
The first implementation that comes to mind is
int add_wrap(int x, int y) {
return (unsigned) x + (unsigned) y;
}
This seems to work with gcc, clang and icc. However, as far as I know, the C standard doesn't specify the cast from unsigned int
to signed int
, leaving freedom to the implementations (see also here).
Otherwise, if the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
I believe most (all?) major compilers do the expected conversion from unsigned
to int
, meaning that they take the correct representative modulus 2^N, where N is the number of bits, but it's not mandated by the standard so it cannot be relied upon (stupid C standard hits again). Also, while this is the simplest thing to do on two's complement machines, it is impossible on ones' complement machines, because there is a class which is not representable: 2^(N/2).
According to the clang docs, one can use __builtin_add_overflow
like this
int add_wrap(int x, int y) {
int res;
__builtin_add_overflow(x, y, &res);
return res;
}
and this should do the trick with clang, because the docs clearly say
If possible, the result will be equal to mathematically-correct result and the builtin will return 0. Otherwise, the builtin will return 1 and the result will be equal to the unique value that is equivalent to the mathematically-correct result modulo two raised to the k power, where k is the number of bits in the result type.
The problem is that in the GCC docs they say
These built-in functions promote the first two operands into infinite precision signed type and perform addition on those promoted operands. The result is then cast to the type the third pointer argument points to and stored there.
As far as I know, casting from long int
to int
is implementation specific, so I don't see any guarantee that this will result in the wrapping behavior.
As you can see [here][godbolt], GCC will also generate the expected code, but I wanted to be sure that this is not by chance ans is indeed part of the specification of __builtin_add_overflow
.
icc also seems to produce something reasonable.
This produces decent assembly, but relies on intrinsics, so it's not really standard compliant C.
Follow the suggestions of those pedantic guys from SEI CERT C Coding Standard.
In their CERT INT32-C recommendation they explain how to check in advance for potential overflow. Here is what comes out following their advice:
#include <limits.h>
int add_wrap(int x, int y) {
if ((x > 0) && (y > INT_MAX - x))
return (x + INT_MIN) + (y + INT_MIN);
else if ((x < 0) && (y < INT_MIN - x))
return (x - INT_MIN) + (y - INT_MIN);
else
return x + y;
}
The code performs the correct checks and compiles to leal
with gcc, but not with clang or icc.
The whole CERT INT32-C recommendation is complete garbage, because it tries to transform C into a "safe" language by forcing the programmers to perform checks that should be part of the definition of the language in the first place. And in doing so it forces also the programmer to write code which the compiler can no longer optimize, so what is the reason to use C anymore?!
The contrast is between compatibility and decency of the assembly generated.
For instance, with both gcc and clang the two following functions which are supposed to do the same get compiled to different assembly.
f
is bad in both cases, g
is good in both cases (addl
+jo
or addl
+cmovnol
). I don't know if jo
is better than cmovnol
, but the function g
is consistently better than f
.
#include <limits.h>
signed int f(signed int si_a, signed int si_b) {
signed int sum;
if (((si_b > 0) && (si_a > (INT_MAX - si_b))) ||
((si_b < 0) && (si_a < (INT_MIN - si_b)))) {
return 0;
} else {
return si_a + si_b;
}
}
signed int g(signed int si_a, signed int si_b) {
signed int sum;
if (__builtin_add_overflow(si_a, si_b, &sum)) {
return 0;
} else {
return sum;
}
}
A bit like @Andrew's answer without the memcpy()
.
Use a union
to negate the need for memcpy()
. With C2x, we are sure that int
is 2's compliment.
int add_wrap(int x, int y) {
union {
unsigned un;
int in;
} u = {.un = (unsigned) x + (unsigned) y};
return u.in;
}
For those who like 1-liners, use a compound literal.
int add_wrap2(int x, int y) {
return ( union { unsigned un; int in; }) {.un = (unsigned) x + (unsigned) y}.in;
}