Search code examples
floating-pointnumbersstoragecomputer-science

How do computers store floating point numbers.?


For positive numbers, the binary can be found out easily. For negative number it is the 2's complement of the positive number. I have a doubt how the computer store or understand the floating point numbers? I see many explanations online show floating point numbers as a point normally as we write. But how do the real computers store or work with it internally?


Solution

  • There are multiple levels of representation of floating-point numbers. Here is information about the two levels people are most often interested in.

    A floating-point number is ±significandbaseexponent with a fixed base and certain requirements on significand and exponent. The exponent is an integer within a range defined by the format, and the significand is a number representable by a numeral using some number of base-base digits, where the number of digits is defined by the format.

    There may be variations on this basic format. For example, the significand may include a radix point (the generalization of a decimal point), so that a format might have significands that are all integers (137., 954., and son ) or that have the radix point at some other fixed location (often just after the first digit, so 1.37, 9.54, and so on). These variations are mathematically equivalent, with the exponent range adjusted to compensate.

    Thus, +1.23456•1013 is a decimal floating-point number with six decimal digits. The point “floats” because we multiply by a power of the base that effectively moves the radix point.

    At this level of representation, a floating-point format may include some special values, notably +∞, −∞, and NaNs (Not a Number “values” that indicate no number is represented).

    The other level of most interest is encoding a floating-point number into bit strings. With IEEE-754 double-precision, it is done this way:

    • Write the number in the format ±significand•2exponent, where significand is represented as a 53-bit binary numeral with the radix point after the first digit and the exponent is in [−1022, +1023]. If the exponent is not −1022, the first digit of significand must be 1. (If it is not, subtract one from the exponent and shift the significand bits left one position, until either the exponent is −1022 or the first digit of significand is 1. Any number that cannot be put into this format is not representable in IEEE-754 double precision.)
    • For +, write “0”. For “−”, write “1”.
    • If the first digit of significand is zero, write “00000000000”. This is a special exponent code that represents subnormal numbers, meaning numbers that cannot be shifted to have a first bit of 1. If the first digit is not zero, add 1023 to the exponent, convert it to an eleven-digit binary numeral, and write that numeral. For example, the exponent −3 is biased to become 1020, the binary for that is “011111111100, so “011111111100” is written.
    • Write the 52 bits of the significand after the first digit.

    But how do the real computers store or work with it internally?

    Once we have encoded floating-point numbers as above, computers work with the parts. The significands behave largely like integers, and the exponents tell us about shifting them.

    When two numbers of the same sign are added, their exponents are compared. The significand of one number is shifted to adjust its position relative to the other according to the difference in exponents. Then the two significands are added, and the result is adjusted (rounded if necessary) to fit into the floating-point format. When two numbers are multiplied, their significands are multiplied, and their exponents are added together. Subtraction, division, and other operations proceed in the same way, by operating on the parts of the representations.

    There are various complications such as having to deal with bounds on the exponents, needed to shift significands to the normal form (leading digit is not zero) after arithmetic, and so on.