Search code examples
c++floating-pointcpudenormal-numberssubnormal-numbers

What is this "denormal data" about ? - C++


I would like to have a broad view about "denormal data" and what it's about because the only thing that I think I got right is the fact that is something especially related to floating point values from a programmer viewpoint and it's related to a general-computing approach from the CPU standpoint .

Someone can decrypt this 2 words for me ?

EDIT

please remember that I'm oriented to C++ applications and only the C++ language.


Solution

  • You ask about C++, but the specifics of floating-point values and encodings are determined by a floating-point specification, notably IEEE 754, and not by C++. IEEE 754 is by far the most widely used floating-point specification, and I will answer using it.

    In IEEE 754, binary floating-point values are encoded with three parts: A sign bit s (0 for positive, 1 for negative), a biased exponent e (the represented exponent plus a fixed offset), and a significand field f (the fraction portion). For normal numbers, these represent exactly the number (−1)s • 2ebias • 1.f, where 1.f is the binary numeral formed by writing the significand bits after “1.”. (For example, if the significand field has the ten bits 0010111011, it represents the significand 1.00101110112, which is 1.182617175 or 1211/1024.)

    The bias depends on the floating-point format. For 64-bit IEEE 754 binary, the exponent field has 11 bits, and the bias is 1023. When the actual exponent is 0, the encoded exponent field is 1023. Actual exponents of −2, −1, 0, 1, and 2 have encoded exponents of 1021, 1022, 1023, 1024, and 1025. When somebody speaks of the exponent of a subnormal number being zero they mean the encoded exponent is zero. The actual exponent would be less than −1022. For 64-bit, the normal exponent interval is −1022 to 1023 (encoded values 1 to 2046). When the exponent moves outside this interval, special things happen.

    Above this exponent interval, floating-point stops representing finite numbers. An encoded exponent of 2047 (all 1 bits) represents infinity (with the significand field set to zero). Below this exponent interval, floating-point changes to subnormal numbers. When the encoded exponent is zero, the significand field represents 0.f instead of 1.f.

    There is an important reason for this. If the lowest exponent value were just another normal encoding, then the lower bits of its significand would be too small to represent as a floating-point values by themselves. Without that leading “1.”, there would be no way to say where the first 1 bit was. For example, suppose you had two numbers, both with the lowest exponent, and with significands 1.00101110112 and 1.00000000002. When you subtract the significands, the result is .00101110112. Unfortunately, there is no way to represent this as a normal number. Because you were already at the lowest exponent, you cannot represent the lower exponent that is needed to say where the first 1 is in this result. Since the mathematical result is too small to be represented, a computer would be forced to return the nearest representable number, which would be zero.

    This creates the undesirable property in the floating-point system that you can have a != b but a-b == 0. To avoid that, subnormal numbers are used. By using subnormal numbers, we have a special interval where the actual exponent does not decrease, and we can perform arithmetic without creating numbers too small to represent. When the encoded exponent is zero, the actual exponent is the same as when the encoded exponent is one, but the value of the significand changes to 0.f instead of 1.f. When we do this, a != b guarantees that the computed value of a-b is not zero.

    Here are the combinations of values in the encodings of 64-bit IEEE 754 binary floating-point:

    Sign Exponent (e) Significand Bits (f) Meaning
    0 0 0 +zero
    0 0 Non-zero +2−1022•0.f (subnormal)
    0 1 to 2046 Anything +2e−1023•1.f (normal)
    0 2047 0 +infinity
    0 2047 Non-zero but high bit off +, signaling NaN
    0 2047 High bit on +, quiet NaN
    1 0 0 −zero
    1 0 Non-zero −2−1022•0.f (subnormal)
    1 1 to 2046 Anything −2e−1023•1.f (normal)
    1 2047 0 −infinity
    1 2047 Non-zero but high bit off −, signaling NaN
    1 2047 High bit on −, quiet NaN

    Some notes:

    +0 and −0 are mathematically equal, but the sign is preserved. Carefully written applications can make use of it in certain special situations.

    NaN means “Not a Number”. Commonly, it means some non-mathematical result or other error has occurred, and a calculation should be discarded or redone another way. Generally, an operation with a NaN produces another NaN, thus preserving the information that something has gone wrong. For example, 3 + NaN produces a NaN. A signaling NaN is intended to cause an exception, either to indicate that a program has gone wrong or to allow other software (e.g., a debugger) to perform some special action. A quiet NaN is intended to propagate through to further results, allowing the rest of a large computation to be completed, in the cases where a NaN is only a part of a large set of data and will be handled separately later or will be discarded.

    The signs, + and −, are retained with NaNs but have no mathematical value.

    In normal programming, you should not be concerned about the floating-point encoding, except to the extent it informs you about the limits and behavior of floating-point calculations. You should not need to do anything special regarding subnormal numbers.

    Unfortunately, some processors are broken in that they either violate the IEEE 754 standard by changing subnormal numbers to zero or they perform very slowly when subnormal numbers are used. When programming for such processors, you may seek to avoid using subnormal numbers.