Search code examples
floating-pointbinarysubtraction

How to calculate -1.10 + 0.0110 in binary?


I need help with the binary subtraction of the floating point numbers. I have to do -1.10 + 0.0110.

a) I do not understand how to understand -1.10. If something is signed, the most upper bit has to bit "1". Here I do not know the amount of bits, hence I have no idea, if -1.10 contains zeros or ones to the left. What does it mean, to use "-" before the binary?

b) The 1) in picture shows my subtraction and check. Obviously, I do something wrong. If I borrow from the left, should I have "1"-s at all bit positions to the left in the minuend, or should I take only as much ones, as necessary for the subtrahend? What I am doing wrong here?

c) In 2) I try to use the two's complement. Should I complement both numbers, or only the subtrahend? Should I complement it back after the subtraction? Should I align to the decimal point? What I am doing wrong here?

The result has to be 1.00, if rounded after the machine addition to 2 digits. But I do not get 1.00 anyway.

Could you show me how to calculate this particular example and answer the questions from a),b),c)? I came to a standstill in my numerical lecture because of my binary sins.

Example


Solution

  • Using base 2 math:

    -1.10 + 0.0110 is like -(1.10 - 0.0110)

    1.10 - 0.0110 is

       1.10
     - 0.0110
     --------
       1.0010   
    

    Round to 2 places after the . , the 1.0010 goes to 1.00 or 1.01. Being half-way, the typical rounding is to the even value.

       1.00   
    

    Applying the - from the beginning, final answer:

      -1.00
    

    Floating point encoding overwhelmingly does not use 2's complement, but sign-magnitude with an biased exponent. @Patricia Shanahan