Search code examples
mathfloating-pointadditionnormalization

normalize the result of a floating point addition


In floating point arithmetic I added two binary numbers 1.1100*2^4 and 0.0110*2^4 and I got 10.0010*2^4.

How many left shifts should I do to normalize it?

Is it (10.0010*2^4), 1.0001*2^4 or 0.1000*2^4--assuming it is 4-bit?

I have seen that a normalized value is considered 0.1000 but in some articles it is mentioned that the digit before the point should be non-zero and in another article it is mentioned that nonzero value as an overflow.


Solution

  • The form used most in the IEEE 754 standard is that a number is in normal form if its significand is in the interval [1, b), where b is the base of the floating-point format. This may be called the scientific form.

    The standard notes an alternative form may be convenient where the significand is viewed as an integer, for which the normal form would have significands in the form [b p−1, b p), where p is the precision of the format, the number of base-b digits in the significand.

    The C standard uses a form in which a normal significand is in [1/b, 1).

    You are probably using the first of these. For that form, you would not shift the significand of 10.00102•24 left at all. To put it in normal form, you would shift it right one bit and add one to the exponent.