Suppose you are limited to 7 bits for a floating-point representation: 1 sign bit, 3 exponent bits, and 3 fraction bits.
First I convert 3/32
to the binary 0.00011
,
then to the standard scientific notation of 1.1 * 2^(-4)
.
At this point I realize my exponent field will be -1
, which is not valid.
I try to represent 3/32
as 0.11 * 2^(-3)
instead, which leads to the more intuitive representation of 1 000 110
.
However, obviously this is a denormalized value, and if I try to convert the representation back to decimal I get -3/16
.
My question is: is it even possible to represent this value precisely within the constraints of the problem?
It looks like the smallest representable value for this scheme is -15
, so -3/32
falls within this interval.
I'm aware that bits are dropped and precision is lost during conversions; is this the case here?
With 1 sign, 3 exponent, and 3 significand bits, following IEEE-754 rules, here're the first four non-negative smallest finite values you can represent:
Bits | Decimal Value
-----------+----------------
0b0000000 | 0
0b0000001 | 0.03125
0b0000010 | 0.0625
0b0000011 | 0.09375
The value you're looking for, 3/32
, equals 0.09375
(decimal); matching the 4th value. So, it is precisely representable in this format.
Detailed representation of this value is:
6 543 210
S E3- S3-
Binary layout: 0 000 011
Hex layout: 03
Precision: 3 exponent bits, 3 significand bits
Sign: Positive
Exponent: -2 (Subnormal, with fixed exponent value. Stored: 0, Bias: 3)
Classification: FP_SUBNORMAL
Binary: 0b1.1p-4
Octal: 0o6p-6
Hex: 0x1.8p-4
Since you wanted -3/32
, you can simply set the sign bit.