Search code examples
floating-pointprecisionieee-754

Is IEEE 754 floating point representation wasting memory?


I always thought that there are 2^64 different fractional values that can be stored by a variable of type double. (Each bit can have either 1 or 0 as value and so 2^64 different values).

Recently I came to know that NaN (not a number) has a representation in which exponent part is 11111111111 and significand part is any non-zero value. Instead, if it were like the representation is NaN if exponent part is 11111111111 and significand part is 111111......(52 times) ?

Won't this allow us to represent 2^52 more different numbers? And 2^52 is a huge number. So are we not wasting the valuable space?


Solution

  • The IEEE-754 floating-point formats were designed with efficient hardware implementation in mind. All the special input operands can be detected by examining the exponent field only, which is either all-0 (zeros and denormals), or all-1 (infinities and NaNs). So for double precision specifically, only a 11-bit comparator is required, and the check can be performed in a fraction of a processor cycle.

    Reserving one of 2048 possible exponent encodings for infinities and NaNs is not particularly wasteful. Note that IEEE-754 uses two different kind of NaNs: Signalling NaNs, or SNaNs, trigger an exception when encountered, while quite NaNs, or QNaNs, are simply propagated through computation until they appear in human-consumable final results. The most significant bit of the mantissa field distinguishes between the two kinds of NaNs: it is cleared for SNaNs and set for QNaNs.

    Additionally, IEEE-754 supports, but does not require, the concept of NaN "payload", i.e. multiple NaN encodings with system- or user-defined meanings. For example, "PowerPC Numerics" (Apple 1994), specifies for the Macintosh system that the 8th through 15th most significant bits of the fraction field of a NaN contain a NaN code which indicates the different origins of NaNs, e.g. sqrt() of a negative number other than zero, log() of a negative number, invalid argument to an inverse trigonometric functions such as asin(). The concept was already used by the SANE (Standard Apple Numerical Environment) introduced with the Apple II, as described in "Apple Numerics Manual, Second Edition" (Apple 1988).

    The C and C++ standards provide a standard function nan() via math.h / cmath that can be used to construct NaN payloads from a string argument in an implementation-defined manner. For a brief description see for example here.