Search code examples
floating-pointbinary

Is it possible to represent -3/32 as a binary floating-point value using only 7 bits


Suppose you are limited to 7 bits for a floating-point representation: 1 sign bit, 3 exponent bits, and 3 fraction bits.

First I convert 3/32 to the binary 0.00011,
then to the standard scientific notation of 1.1 * 2^(-4).

At this point I realize my exponent field will be -1, which is not valid.
I try to represent 3/32 as 0.11 * 2^(-3) instead, which leads to the more intuitive representation of 1 000 110.
However, obviously this is a denormalized value, and if I try to convert the representation back to decimal I get -3/16.

My question is: is it even possible to represent this value precisely within the constraints of the problem?
It looks like the smallest representable value for this scheme is -15, so -3/32 falls within this interval.
I'm aware that bits are dropped and precision is lost during conversions; is this the case here?


Solution

  • With 1 sign, 3 exponent, and 3 significand bits, following IEEE-754 rules, here're the first four non-negative smallest finite values you can represent:

    Bits       | Decimal Value
    -----------+----------------
    0b0000000  | 0
    0b0000001  | 0.03125
    0b0000010  | 0.0625
    0b0000011  | 0.09375
    

    The value you're looking for, 3/32, equals 0.09375 (decimal); matching the 4th value. So, it is precisely representable in this format.

    Detailed representation of this value is:

                      6 543 210
                      S E3- S3-
       Binary layout: 0 000 011
          Hex layout: 03
           Precision: 3 exponent bits, 3 significand bits
                Sign: Positive
            Exponent: -2 (Subnormal, with fixed exponent value. Stored: 0, Bias: 3)
      Classification: FP_SUBNORMAL
              Binary: 0b1.1p-4
               Octal: 0o6p-6
                 Hex: 0x1.8p-4
    

    Since you wanted -3/32, you can simply set the sign bit.