For IEEE-754 floating point arithmetic, is the mantissa in [0.5, 1) or in [1, 2)?

I was looking at several textbooks, including Numerical Linear Algebra by Trefethen and Bau, and in the section on floating point arithmetic, they seem to say that in IEEE-754, normalized floating point numbers take the form .1.... X 2^e. That is, the mantissa is assumed to be between 0.5 and 1.

However, in this popular online floating point calculator, it is explained that normalized floating point numbers have a mantissa between 1 and 2.

Could someone please tell me which is the correct way?

Solution

All ways are correct. The following sets are identical:

{ (−1)^s•f•2^e | s ∈ {0, 1}, f is the value of a 24-bit binary numeral with a radix point after the first digit, and e is an integer such that −126 ≤ e ≤ 127 }.
{ (−1)^s•f•2^e | s ∈ {0, 1}, f is the value of a 24-bit binary numeral with a radix point before the first digit, and e is an integer such that −125 ≤ e ≤ 128 }.
{ (−1)^s•f•2^e | s ∈ {0, 1}, f is the value of a 24-bit binary numeral with a radix point after the last digit, and e is an integer such that −149 ≤ e ≤ 104 }.
{ f•2^e | f is an integer such that |f| < 2²⁴, and e is an integer such that −149 ≤ e ≤ 104 }.

In other words, we may put the radix point anywhere in the significand we want, simply by adjusting the range of the exponent to compensate. Which form to use may be chosen for convenience or preference.

The third form scales the significand so it is an integer, and the fourth form incorporates the sign into the significand. This form is convenient for using number theory to analyze floating-point behavior.

IEEE 754 mostly uses the first form. It refers to this as “a scientific form,” reflecting the fact that, in scientific notation, we commonly write numbers with a radix point just after the first digit, as in “The mass of the Earth is about 5.9722•10²⁴ kg.” In clause 3.3, IEEE 754-2008 mentions “It is also convenient for some purposes to view the significand as an integer; in which case the finite floating-point numbers are described thus:”, followed by text equivalent to the third form above except that it is generalized (the base and other parameters are arbitrary values for any floating-point format rather than the constants I used above specifically for the binary32 format).

The C standard describes numbers in the second form (for any base, not necessarily baes two), with the radix point before the first digit, and the exponent provided by its frexp function matches this scale.