Search code examples
floating-pointbinaryfloating

Why can we convert float in binary by multiply 2?


I know how we can convert floating to binary follow this question How to convert float number to Binary?

But how come it works this way? 0.625 × 2 resulting 1.25 and 1.25 is a base 10 number. Why can we use the integer part from the base 10 number to convert to binary?


Solution

  • It helps to use clear terminology: 1.25 is a number, an abstract mathematical entity. Five-fourths and 1¼ are the same number. “1.25” is a numeral, a sequence of text or writing that stands for a number.

    When we multiply 0.625 by 2, get 1.25, take 1 for a binary digit, and continue to work with 0.25, we are working with numbers, but we are using numerals as tools for that. If I transformed 0.625 to some other numeral system and multiplied by 2, I would get 1.25 in that other numeral system. It is the arithmetic we are concerned with and the values it produces, not the particular numeral system used.

    For example, in base 60, 0.62510 is (0).(37)(30)60. I have had to use parentheses to denote the base-60 digits. A “0” followed by a “.” and digit number 37 and digit number 30 is a numeral for 0.625 in base 60. If I multiple it by 2, the (30) becomes (00) and a carry, (37) becomes (74) and gets a carry added to it (75) and becomes (15) and a carry, so the result is (1).(15)60. That equals 1¼. So we could convert into binary using any base we like.

    By definition of positional notation, binary digits djdj−1dj−2d2d1d0.d−1d−2… represent the number

    dj•2j + dj−1•2j−1 + dj−2•2j−2 + … + d2•22 + d1•21 + d0•20 + d−1•2−1 + d−2•2−2

    With 0.625, we can see the digits djdj−1dj−2d2d1d0 are all zero. So we want to find d−1 and any digits following it.

    We can write 0.625 = d−1•2−1 + d−2•2−2 + d−3•2−3

    Then multiply both sides by 2: 1.25 = d−1•20 + d−2•2−1 + d−3•2−2… (note the exponents increased).

    We know from studying geometric series that the terms d−2•2−1 + d−3•2−2… sum to less than one (supposing the digits are not all ones). So, for the sum to equal 1.25, d−1 must be 1. We have our first digit.

    And then we can write 1.25 = 1•20 + d−2•2−1 + d−3•2−2

    So 0.25 = d−2•2−1 + d−3•2−2

    Now multiply by 2 again: 0.5 = d−2•20 + d−3•2−1

    Now d−2 must be 0, or the sum would be larger than 0.5. So that is our second digit.

    Then we have 0.5 = 0•20 + d−3•2−1

    And 0.5 = d−3•2−1

    Multiplying by 2 another times gives 1 = d−3•20

    Now we see d−3 is 1 and all trailing digits are 0.

    So 0.62510 = 0.1012.