Search code examples
cbitunsignedsigned

How to evaluate –2147483647–1U in C (32 bit program using two's complement arithmetic)?


If the operation is addition, i.e. x + y = z, assuming x = -2147483647 (signed integer), and y = -1U, then -2147483647 + (-1U) = z

What is -1U? Is it signed? Unsigned?


Solution

  • Overflow vs wrap-around, undefined behavior vs well-defined behavior

    Wrap-around, going from 0 to something like UINT_MAX or the other way around, is only well-defined for unsigned numbers.

    If you attempt something like that with signed numbers, we don't get wrap-around but overflow and the behavior is not well-defined in the C language. The program might as well crash or misbehave as to produce a wrap-around effect.

    This is kind of an historical defect in the C language, because it allows (for now) many forms of signed formats and not just two's complement, which is by far the most common form. And overflows in the various signed formats are not consistent with each other.

    (Imagine a signed byte with well-defined overflow. 2's complement 127 would realistically overflow into -128, but signed magnitude 127 would overflow into 0 because there's no reason why overflow would affect the sign bit. Similarly, signed magnitude -128 would underflow into -0 whereas 2's complement -128 would underflow into 127.)

    Whereas on the assembler level, overflows of 2's complement arithmetic are always well-defined. Most instruction sets (ISA) sets an overflow bit when that happens but the result is otherwise deterministic, not like in C where anything can happen.


    Integer constants, their types and implicit promotion

    We may note that every integer constant such as 2147483647 has a type in C just like a variable. The default type is int but if we append an U or u suffix, the type turns unsigned.

    Integer constants can never be negative in C. The - is actually the unary minus operator applied to a positive value.

    The C code -2147483647 or -(2147483647) both give the value -2147483647. But The C code -1U or -(1U) cannot result in a negative value because the type of the integer constant is unsigned int. So assuming 32 bit int, we get a wrap-around effect instead, resulting in the positive value 4294967295.

    Therefore -2147483647 + (-1U) equals -2147483647 + 4294967295U. However, in this expression the left integer constant is of type int and the right one of type unsigned int. One of the Implicit type promotion rules known as "the usual arithmetic conversions" applies, stating the the signed operand gets converted to unsigned type. This conversion is well-defined and we end up with the equivalent of 2147483649U + 4294967295U.

    2147483649U + 4294967295U leads to wrap-around and the result is 2147483648U, of unsigned int type.


    Converting back to signed form

    If we would convert this back to int which is supposedly the type of "z" in your example, 2147483648 will not fit inside a signed int. The result of the conversion is then implementation-defined = compiler-specific.

    The most reasonable and common behavior implemented by most compilers, is then to turn this into the signed representation equivalent, which is -2147483648. Or in raw binary 0x80000000, which is the raw binary representation of 2147483648 as well, if it had fit.

    So -2147483647 + -1U converted back to signed int is likely to give the same result as -2147483647 + -1, namely -2147483648.