Search code examples
floating-pointbinaryieee-754

Relationships between 128, 64, and 32 bit IEEE-754 floating point numbers


I want to get familiar and comfortable with floating-point numbers. I'm doing a project that would hopefully help me achieve this by creating dynamically allocated, arbitrarily sized, floating point numbers in C++. I've looked through the IEEE-754 specifications for the standard floating point definitions but I could not find a common correlation between them (I used references from wikipedia on 32, 64, and 128 bit floating point numbers). So my question is: Is there a common pattern between floating point numbers that can be applied to any arbitrarily sized floating point number?

If not, from a programming perspective, would it be easier to define my own floating point representation that does have a pattern?

EDIT: By pattern I mean bits in the mantissa and exponent.


Solution

  • There is no mandated mathematical rule for the numbers of bits in the significand1 or the exponent. IEEE 754-2008 does show a formula that describes its listed interchange formats for certain sizes, but this is in a non-normative note:

    • For a storage width k bits, the number of bits in the significand (the mathematical significand with the leading bit, not the field that primarily encodes it without the leading bit), p, is k−round(4×log2(k))+13.
    • The number of bits in the exponent field, w, is kp.

    The formula does not hold for 16 or 32 bits; it is only said to hold for 64 bits and widths that are multiples of 32 greater than or equal to 128 (so not widths 32 or 96). I suppose you can consider it a suggestion for larger sizes, but it is not binding.

    As far as I know, the parameters specified in table 3.5 of clause 3.6 of IEEE 754-2008 arise from striking balances and historic usage. You can define formats with other parameters as described in clause 3.7. 3.7 gives some recommendations for defining extended precisions using parameters of the precision (digits in the significand) and maximum exponent or just the precision. Or you can disregard IEEE 754 and define your own formats. The standards are not mandatory, and what your design should be is a function of what the goals are.

    Note

    1 “Significand” is the preferred term for the fraction part of a floating-point number. “Mantissa” is a term for the fraction part of a logarithm. Significands are linear (if the number increases by a factor of 1.2, the significand increases by a factor of 1.2, unless an exponent threshold is crossed), mantissas are logarithmic.