My answers to the problem below differ from the answer key. The problem: We assume that IEEE decided to add a new 8-bit representation with its main characteristics consistent with the 32/64-bit representations. Consider the following four 8-bit numbers:
A: 11100101 B: 00111001 C: 00001100 D: 00011101
The decimal values represented by the above numbers are as follows, in no particular order: 3.125, -21, 29/32, 3/8. Q1: Which 8-bit floating point number represents 29/32(choose from A, B, C, D)? A1: D Given the above information, figure out the following: Q2: Number of bits needed for exponent A2: 3 Q3: Number of bits needed for fraction: A3: 4
I agree with the answer to Q1, but I got different answers for A2 and A3 (A2: 2 and A3: 5) 29/32 = 29 * 2 ^-5 => in binary 11101 * 2^-5. If we shift the decimal point to convert it to binary normalized form: 1.1101 * 2^-1. So the answer for Q1 should be the bit pattern that ends in 1101, hence D. Answering Q2: If the answer is 3: 0 001 1101, frac = 1101, exp = 001 (normalized), bias = 3 => E = exp - bias; E = 1 - 3 = -2. If we convert it all back to the binary normalized form (1.frac * 2^E) we will get: 1.1101 * 2^-2 = 11101 * 2^-6 = 29/64 (not 29/32 as stated initially). But when I use the following representation 0 00 11101: 2 bits for exp (bias = 2^1 - 1 = 1), 5 bits for frac the results match. exp = 00, therefore de-normalized notation is used (0.frac * 2^E, where E = (exp+1) - bias): E = 0+1-1=0 => 0.11101 * 2^0 = 11101 * 2^-5 = 29/32. What am I doing wrong? Thank you!
−21 must be represented by A, 11100101, since that is the only one with the sign bit set. With three bits for the exponent encoding and four for the main significand encoding, we have an exponent bias of 3, so 1102 = 6 and represents an exponent of 3, and 0101 in the significand field represents 1.01012 = 21/16, so the value represented is −1 • 23 • 21/16 = −10½, which is half what we expected, −21.
For B, we have 00111001 → 0 011 1001 → +1 • 23−3 • 1.10012 → +1 • 1 • 25/16 = 1.5625, which is also half what we expected, 3.125.
For C, we have 00001100 → 0 000 1100 (which is subnormal) → +1 • 21−3 • 0.11002 → +1 • 2−2 • 12/16 = 3/16, which is again half what we expected, 3/8.
So it is apparent an error was made in constructing the problem; the values use an exponent bias of −4 instead of the −3 that would follow from the IEEE-754 pattern. (Or an equivalent error was made, such as positioning the significand with the leading bit after the radix point instead of before it.)