Search code examples
floating-pointieee-754

How does floating-point addition work in "np.finfo(np.float64).max + 1"?


How does addition work in floating-point for this case:

In [6]: np.finfo(np.float64).max + 1
Out[6]: 1.7976931348623157e+308

Why is there no overflow raised?


Solution

  • Overview

    Exceptions mean something exceptional has occurred, meaning something has not worked normally. When normal floating-point rounding occurs, there is no exception.

    Floating-point arithmetic is normally rounded. When we add a very small number to, say, 2.125, the result is 2.125 (using the default rounding method of round-to-nearest, ties-to-even).

    When we add one to the maximum representable finite number, rounding would produce the maximum representable finite number even if we could represent further finite numbers beyond that. So there is no exception here: The arithmetic and the rounding have worked the way floating-point arithmetic normally do.

    An overflow exception occurs when rounding produces an infinite result that would not have occurred without rounding. (Sometimes an infinite result is the correct result without rounding, as when we start with infinity and add one. In this case, there is no overflow because the arithmetic has worked normally, giving a correct result.)

    Details

    Floating-point arithmetic is specified to produce a result as if two steps were performed:

    • Calculate the infinitely precise mathematical result (as with ordinary real-number arithmetic).
    • Round that result to a value representable in the floating-point format using a specified method (most often a default of round-to-nearest with ties-to-even).

    Infinity is a representable result. It is, of course, never the mathematically nearest value to a finite result. However, rounding is specified to behave as if it were done like this:

    • Consider all the values that would be representable if the floating-point exponent were not bounded.
    • Round the real-number result to one of these values using the specified method.
    • If the result is within the exponent bounds (a result actually representable in the format), produce it. If it is above the exponent bounds, produce an infinity (with the proper sign). (If it is below the exponent bounds, there is a potential underflow condition, not discussed further in this answer.)

    For example, if our format is limited to three-decimal-digit numbers up to 999, then 999 + .4 would produce 999 because 999.4 rounds to 999. And 999 + .6 would produce infinity because 999.6 rounds to 1000, which is outside the format bounds.

    Overflow is specified to occur when the real-number arithmetic result is finite but the rounded result is an infinity.

    For exanple, if we add infinity and three, there is no overflow because the ordinary mathematical result is infinity, and floating-point arithmetic correctly produces that result. In the 999 + .6 case above, there is overflow because the rounded result is infinity but the real result would be finite. In the 999 + .4 case, there is no overflow because the rounded result, 999, is finite.

    This is what happens in your case of adding one to the maximum representable finite number: Ordinary rounding produces a finite result, so there is no overflow.

    Note that, if we select round upward (round toward +∞), then adding one to the largest representable finite value does overflow. This is because this rounding method causes infinity to be produced for the floating-point addition, and then we have an exception: The mathematical result is finite, but the floating-point result is infinite, so rounding did not work as it does normally within the finite range.