Search code examples
c++assemblylanguage-lawyerimplicit-conversioninteger-overflow

Is there any difference between overflow and implicit conversion, whether technical or bit level (cpu-register-level)?


(I'm an novice, so there may be inaccuracies in what I say)

In my current mental model, an overflow is an arithmetical phenomenon (occurs when we perform arithmetic operations), and an implicit conversion is an assignment (initialization or not) phenomenon (occurs when we make assignments which the right-hand's value dont fit into left-hand value.

However, often I see the concepts 'overflow' and 'implicit conversion' used interchangeably, , different from what I expect. For example, this quote from the learncpp team, talking about overflow and 'bit insufficiency' for signed int:

Integer overflow (often called overflow for short) occurs when we try to store a value that is outside the range of the type. Essentially, the number we are trying to store requires more bits to represent than the object has available. In such a case, data is lost because the object doesn’t have enough memory to store everything [1].

and this, talking about overflow for unsigned int :

What happens if we try to store the number 280 (which requires 9 bits to represent) in a 1-byte (8-bit) unsigned integer? The answer is overflow [2]*

and especially this one, who uses 'modulo wrapping':

Here’s another way to think about the same thing. Any number bigger than the largest number representable by the type simply “wraps around” (sometimes called “modulo wrapping”). 255 is in range of a 1-byte integer, so 255 is fine. 256, however, is outside the range, so it wraps around to the value 0. 257 wraps around to the value 1. 280 wraps around to the value 24 [2].

In such cases, it is said that assignments that exceed the limits of the lefthand lead to overflow, but I would expect, in this context, the term 'implicit conversion'.

I see the term overflow used also for arithmetic expressions whose result exceeds the limits of the lefthand.


1 Is there any technical difference between implicit conversion and overflow/underflow?

I think so. In the reference [3] in the section 'Numeric conversions - Integral conversions', for unsigned integer:

[...] the resulting value is the smallest unsigned value equal to the source value modulo 2^n where n is the number of bits used to represent the destination type [3].

and for signed (bold mine):

If the destination type is signed, the value does not change if the source integer can be >represented in the destination type. Otherwise the result is implementation-defined (until C++20)the unique value of the destination type equal to the source value modulo 2n where n is the number of bits used to represent the destination type. (since C++20). ** (Note that this is different from signed integer arithmetic overflow, which is undefined)**[3].

If we go to the referenced section (Overflow), we found (bold mine):

Unsigned integer arithmetic is always performed modulo 2n where n is the number of bits in that particular integer. [..]

When signed integer arithmetic operation overflows (the result does not fit in the result type), the behavior is undefined [4].

To me, clearly overflow is an arithmetic phenomenon and implicit conversion is a phenomenon in assignments that do not fit. Is my interpretation accurate?


2 Is there on bit level (cpu) any difference between implicit conversion and overflow?

I think so also. I'm far from being good at c++, and even more so at assembly, but as an experiment, if we check the output of the code below with MSVC (flag /std:c++20) and MASM (Macro Assembly), especially checking the flag register, different phenomena occur if is an arithmetic operation or is an assignment ('Implicit conversion').

(I checked the flags register in the Debugger of Visual Studio 2022. The Assembly below is is practically the same as the from debugging).

    #include <iostream>
    #include <limits>
        
    int main(void) {
      long long x = std::numeric_limits<long long>::max();   
      int y = x;           
      //
      //
      long long k = std::numeric_limits<long long>::max();      
      ++k;                
    }

The output is:

    y$ = 32
    k$ = 40
    x$ = 48
    main PROC
    $LN3:
      sub rsp, 72 ; 00000048H
      call static __int64 std::numeric_limits<__int64>::max(void) ; 
      std::numeric_limits<__int64>::max
      mov QWORD PTR x$[rsp], rax
      mov eax, DWORD PTR x$[rsp]
      mov DWORD PTR y$[rsp], eax
      call static __int64 std::numeric_limits<__int64>::max(void) 
      ; std::numeric_limits<__int64>::max
      mov QWORD PTR k$[rsp], rax
      mov rax, QWORD PTR k$[rsp]
      inc rax
      mov QWORD PTR k$[rsp], rax
      xor eax, eax
      add rsp, 72 ; 00000048H
      ret 0
    main ENDP

It can be checked at https://godbolt.org/z/6j6G69bTP

The copy-initialization of y in c++ corresponds to that in MASM:

    int y = x;
    mov eax, DWORD PTR x$[rsp]                                        
    mov DWORD PTR y$[rsp], eax

The mov statement simply ignores the 64 bits of 'x' and captures only its 32 bits. It is cast from the operator dword ptr and stores the result in the 32-bit eax register. The mov statement don't set neither the overflow or carry flag.

The increment of k in c++ corresponds to that in MASM:

    ++k;

    mov rax, QWORD PTR k$[rsp]
    inc rax
    mov QWORD PTR k$[rsp], rax

When the inc statement is executed, the overflow flag (signed overflow) is set to 1.

To me, although you can implement (mov) conversions in different ways, there is a clear difference between conversions using mov variants and arithmetic overflow: arithmetic sets the flags. Is my interpretation accurate?


Notes

  • *Apparently there's a discussion about the term overflow for unsigned, but that's not what I'm discussing

References

[1] https://www.learncpp.com/cpp-tutorial/signed-integers/ [2] https://www.learncpp.com/cpp-tutorial/unsigned-integers-and-why-to-avoid-them/ [3] https://en.cppreference.com/w/cpp/language/implicit_conversion [4] https://en.cppreference.com/w/cpp/language/operator_arithmetic#Overflows


Solution

  • Let's try to break it down. We have to start with some more terms.

    Ideal Arithmetic

    Ideal arithmetic refers to arithmetic as it takes place in mathematics where the involved numbers are true integers with no limit to their size. When implementing arithmetic on a computer, integer types are generally limited in their size and can only represent a limited range of numbers. The arithmetic between these is no longer ideal in the sense that some arithmetic operations can result in values that are not representable in the types you use for them.

    Carry out

    A carry out occurs when in an addition, there is carry out of the most significant bit. In architectures with flags, this commonly causes the carry flag to be set. When calculating with unsigned numbers, the presence of a carry out indicates that the result did not fit into the number of bits of the output register and hence does not represent the ideal arithmetic result.

    The carry out is also used in multi-word arithmetic to carry the 1 between the words that make up the result.

    Overflow

    On a two's complement machine, an integer overflows when the carry out of an addition is not equal to the carry into the final bit. In architectures with flags, this commonly causes the overflow flag to be set. When calculating with signed numbers, the presence of overflow indicates that the result did not fit into the output register and hence does not represent the ideal arithmetic result.

    With regards to “the result does not fit,” it's like a carry out for signed arithmetic. However, when using multi-word arithmetic of signed numbers you still need to use the normal carry out to carry the one to the next word.

    Some authors call carry out “unsigned overflow” and overflow “signed overflow.” The idea here is that in such a nomenclature, overflow refers to any condition in which the result of an operation is not representable. Other kinds of overflows include floating-point overflow, handled on IEEE-754 machines by saturating to +-Infinity.

    Conversion

    Conversion refers to taking a value represented by one data type and representing it in another data type. When the data types involved are integer types, this is usually done by extension, truncation, or saturation, or reinterpretation

    • extension is used to convert types into types with more bits and refers to just adding more bits past the most significant bit. For unsigned numbers, zeroes are added (zero extension). For signed numbers, copies of the sign bit are added (sign extension). Extension always preserves the value extended.
    • truncation is used to convert types into types of less bits and refers to removing bits from the most significant bit until the desired width is reached. If the value is representable in the new type, it is unchanged. Otherwise it will be changed as if by modulo reduction.
    • saturation is used to convert types into types of the same amount or less bits and works like truncation, but if the value is not representable, it is replaced by the smallest (if less than 0) or largest (if greater than 0) value of the destination type.
    • reinterpretation is used to convert between types of the same amount of bits and refers to interpreting the bit pattern of the original type as the new type. Values that are representable in the new type are preserved when doing this between signed and unsigned types. (For example, the bit-pattern for a non-negative signed 32 bit integer represents the same number when interpreted as an unsigned 32 bit integer.)

    An implicit conversion is just a conversion that happens without being explicitly spelled out by the programmer. Some languages (like C) have these, others don't.

    When an attempt is made to convert from one type or another and the result is not representable, some authors too refer to this situation as “overflow,” like with “signed overflow” and “unsigned overflow.” It is however a different phenomenon caused by a change in bit width and not a result of arithmetic. So yes, your interpretation is accurate. These are two separate phenomena related through the common idea of “resulting value doesn't fit type.”

    To see how the two are interlinked, you may also interpret addition of two n bit numbers as resulting in a temporary n + 1 bit number such that the addition is always ideal. Then, the result is truncated to n bit and stored in the result register. If the result is not representable, then either carry out or overflow occurred, depending on the desired signedness. The carry out bit is then exactly the most significant bit of the temporary result that is then discarded to reach the final result.

    Question 2

    To me, although you can implement (mov) conversions in different ways, there is a clear difference between conversions using mov variants and arithmetic overflow: arithmetic sets the flags. Is my interpretation accurate?

    The interpretation is not correct and the presence of flags is a red herring. There are both architectures where data moves set flags (e.g. ARMv6-M) and architectures where arithmetic doesn't set flags (e.g. x86 when using the lea instruction to perform it) or that do not even have flags (e.g. RISC-V).

    Note also that a conversion (implicit or not) does not necessarily have to result in an instruction. Sign extension and saturations usually do, but zero extension is often implemented by just ensuring that the top part of a register is clear which the CPU may be able to do as a side effect of other operations you want to perform anyway. Truncation may be implemented by just ignoring the top part of the register. Reinterpretation of course by its nature does not generate any code either generally speaking.

    As for carry out and overflow, the occurrence of these depend on the values you perform arithmetic with. These are things that just happen and unless you want to detect that they happen, no code is needed for that. It's simply the default thing.