Search code examples
language-agnosticfloating-pointfloating-point-precisionieee-754

How gradual underflow is represented in binary


I'm wondering how underflowed data represented in binary. In case of float we have 32 bits and all of them have their own meaning, where we store information that mantissa is no more normalized?


Solution

  • From the Wikipedia entry on IEEE-754:

    The number representations described above are called normalized, meaning that the implicit leading binary digit is a 1. To reduce the loss of precision when an underflow occurs, IEEE 754 includes the ability to represent fractions smaller than are possible in the normalized representation, by making the implicit leading digit a 0. Such numbers are called denormal. They don't include as many significant digits as a normalized number, but they enable a gradual loss of precision when the result of an arithmetic operation is not exactly zero but is too close to zero to be represented by a normalized number.

    A denormal number is represented with a biased exponent of all 0 bits, which represents an exponent of −126 in single precision (not −127), or −1022 in double precision (not −1023).

    So there is a special exponent value (all zero bits) which signifies that the mantissa does not have an implicit leading 1, and should therefore be interpreted as a denormalised number.