python floating-point precision ieee-754

Why does 2**-1025 != 0.0 in Python

From the specs of IEEE754, a float coded on 64 bits has 11bits for the exponent and 52 bits for the mantissa. Hence, the smaller number that could be coded as a float should be 2**(-2**10). The wikipedia page which I believe is true, give a more exact value of 2**-1022 whose decimal value is approximately 2.2250738585072014e-308.

But, with Python, I can use as float numbers such as 2**-1052, etc. The actual limit on my computer is 2**-1074. From this page of the official documentation, Python usually conforms to IEEE754.

At the same time, the maximal value is 2**1023, which is the given value by the IEEE754 standard.

Why is it so?
Does anyone has an explanation?
What is the actual encoding of a float in Python?
Why does the range for the exponent, which is equal to 1074+1023+1, ie 2098, is not a power of 2?

Solution

First, the Python documentation does not specify that IEEE-754 is used. The choice of floating-point implementation is up to each Python implementation. IEEE-754 is very popular but not universal.

In the IEEE-754 basic 64-bit binary floating-point format (binary64), there is a one-bit sign field s, an eleven-bit exponent field e, and a 52-bit primary significand field f (for “fraction”).

The sign bit is 0 for positive and 1 for negative.

If e is all one bits (1111111111, or 2047 in decimal), the object represents an infinity (if f is zero) or a NaN (if f is not zero). (“NaN” stands for “Not a Number”).

If e is neither all zero bits nor all one bits, then it represents an exponent E = e − 1023, and the f field is used to form a number F that is 1.f (that is, the binary numeral “1.” followed by the 52 bits of f). (Equivalently, if we regard f as a binary numeral, we can say F = 1 + f • 2⁻⁵².) The number represented is (−1)^s • 2^E • F. These are called normal numbers.

If e is all zero bits, then it represents an exponent E = 1 − 1023, and the f field is used to form a number F that is 0.f (that is, the binary numeral “0.” followed by the 52 bits of f). (Equivalently, we can say F = 0 + f • 2⁻⁵².) Again, the number represented is (−1)^s • 2^E • F. These are called subnormal numbers.

Note that e = 1 and e = 0 represent the same exponent E, 1 − 1023 = −1022, but change the first bit of F from 1 to 0. F is the significand (the fraction part) of the floating-point number. (People sometimes refer to f as the significand¹, but that is incorrect. It is only the field that provides most of the encoding of the mathematical significand. As we see above, the exponent field also contributes to forming the actual significand, F.)

The smallest positive number occurs when s is zero, e is zero, and f is 1 (0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001). Then E = −1022 and F = 0 + 1 • 2⁻⁵², so the number represented is (−1)⁰ • 2⁻¹⁰²² • 2⁻⁵² = 2⁻¹⁰⁷⁴.

The largest finite number occurs when s is zero, e is 2046 (11111111110), and f is all ones (1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111). E = 2046 − 1023 = 1023. Note that f, interpreted as an integer, is 2⁵²−1, so F = 1 + *(2⁵²−1) • 2⁻⁵² = 1 + 1 − 2⁻⁵² = 2 - 2⁻⁵². So the value represented is (−1)⁰ • 2¹⁰²³ • (2 - 2⁻⁵²) = 2¹⁰²⁴ − 2⁹⁷¹.

Footnote

¹ The significand is sometimes referred to as the “mantissa,” but that is an old term for the fraction portion of a logarithm. It is not entirely appropriate for floating-point numbers, as mantissas are logarithmic while significands are linear.