Search code examples
floating-pointprecisionnormalizationfloating-accuracysubnormal-numbers

How is the implicit 1 stored and differentiated with the 0 for subnormals?


I think I understand why we treat it as an implicit 1 and normalizing, how the values are represented and all. My only issue was how the machine side of it is, as in how the implicit 1 and 0 is differentiated when the value is read/interpreted by the machine. I don't know if I am asking this correctly. Basically how the machine differentiates between a normalized and a subnormal number.

  1. How is it interpreted given that there are only 32 bits and the implicit 1 isn't included in those bits?

  2. How is the 0 for subnormals differentiated from the 1 by the machine?

  3. When it comes to accuracy and precision, what could be said about floating point? Would it be correct to say as the actual exponent increases precision decreases and the inverse respectively? How does accuracy factor into this?


Solution

    1. How is it interpreted given that there are only 32 bits and the implicit 1 isn't included in those bits?

    2. How is the 0 for subnormals differentiated from the 1 by the machine?

    If the bits in the exponent field are all zeros, then the leading bit of the significand is zero. (It is a misnomer to call it an implicit bit; it is formally specified as a function of the exponent field, so this is explicitly stated.)

    If the bits in the exponent field are neither all zeros nor all ones, then the leading bit of the significand is one.

    If the bits in the exponent field are all ones, then the floating-point object represents either an infinity (if the significand field is all zeros) or a NaN (otherwise).

    The exponent field values of 0 and 1 actually encode the same exponent, −126 for the IEEE-754 “single precision” format (binary32). An exponent field value of e where 0 < e < 255 encodes an exponent value of E = e − 127, so 1 encodes −126. The exponent field value of 0 also encodes −126; the only difference between the exponent field values of 0 and 1 is that 0 means the leading bit of the significand is 0 and 1 means the leading bit of the significand is 1.

    1. When it comes to accuracy and precision, what could be said about floating point?

    Handbook of Floating-Point Arithmetic, second edition, 2018, by Jean-Michel Muller et al.

    Would it be correct to say as the actual exponent increases precision decreases and the inverse respectively?

    Precision generally refers to the number of digits in the significand, so it does not change as the exponent change. The precision of an IEEE-754 “single precision” number is 24 bits.

    How does accuracy factor into this?

    Accuracy depends on how the arithmetic is used and the data it is used with. The accuracy of floating-point results can range from zero to infinity or NaN. Some uses of floating-point are “stable” and produce good results near the ideal results. Some uses are unstable and produce results that have gone astray.