Search code examples
floating-pointprecision

Can the product of two "whole" doubles always be correctly represented?


This seems like a very basic question but my knowledge of floating point numbers is getting hazy. Say I have two doubles (according to IEEE 754) which both exactly represent a whole number, for example 10 or -1234. If I multiply them together, will the result also always be an exact representation of the "correct" result I would have gotten if both numbers were arbitrary precision integers?


Solution

  • No. A trivial proof for any numerical format is to realize that nn is not representable in the format, where n is the greatest whole number representable in the format, provided 1 < n.

    For a concrete demonstration:

    IEEE-754 binary64 (“double precision”) represents a number as ±F•2e, where F is a number representable with 53 binary digits and e is an integer within certain limits. (Details about scaling of F and its relationship to e and limits on e are not discussed in this answer.) Observe the number of significant digits in a binary numeral for ±F•2e is not affected by e; it is always at most 53 binary digits. (For this purpose, significant digits are those from the first non-zero digit to the last. They do not include zeros that only establish position, such as the zeros in “.0011”.)

    Consider 252+1. This is representable with 53 binary digits (a 1 followed by 51 0s followed by a 1). It is a whole number. When multiplied by itself, the product is 2104 + 2•252 + 1 = 2104 + 253 + 1. That number requires 105 significant binary digits to represent, so it cannot be represented in the binary64 format.