I'm trying to convert the number 170.3 into a IEE 754 binary32 float:
You can see my working from the images below:
So 170 in Binary is 10101010
We can see that the pattern 1001 will repeat for ever, so we have something like
0.3 = 0.01001 where the bold part is recurring
When we put these numbers together, we can get the binary representation of the whole value:
170.3 = 10101010.01001
where the bold part is recurring.
170.3 = 1.010101001001 x 2⁷
This is how our 4 bytes (32 bits) are allocated:
So we can combine these together to get the binary data to store into our 4 bytes (32 bits):
01000011001010100100110011001100
Which, when split into 4 bytes should be:
01000011-00101010-01001100-11001100
I then try and run this C++ program, which stores the float and prints the memory:
#include <iostream>
/* Prints Contents of Memory Blocks */
static void print_bytes(const void *object, size_t size){
#ifdef __cplusplus
const unsigned char * const bytes = static_cast<const unsigned char *>(object);
#else // __cplusplus
const unsigned char * const bytes = object;
#endif // __cplusplus
size_t i;
printf("[-");
for(i = 0; i < size; i++)
{
//printf(bytes[i]);
int binary[8];
for(int n = 0; n < 8; n++){
binary[7-n] = (bytes[size -1 - i] >> n) & 1;
}
/* print result */
for(int n = 0; n < 8; n++){
printf("%d", binary[n]);
}
printf("%c", '-');
}
printf("]\n\n");
}
int main () {
std::cout << "\nStoring a Float in Memory";
std::cout << "\n----------------------------\n\n";
float height = 170.3f;
std::cout << "Address is "<< &height << "\n\n";
std::cout << "Size is "<< sizeof(height) << " bytes\n\n";
std::cout << "Value is " << height << "\n\n";
std::cout << "Memory Blocks : \n";
print_bytes(&height, sizeof(height));
return 0;
}
But in the output, I can see that the last bit is a 1 and not a 0 as per my calculations:
And also, when using online converters, the last bit also becomes a 1:
Could someone please explain to me where I went wrong in my calculation?
Could someone please explain to me where I went wrong in my calculation?
OP did not properly account for rounding.
Typically conversion uses the rounded value (round to nearest , ties to even)
12345678 9012345678901234
+10000110. 134
0.0100110011001100 1 1001... 0.3
+10000110.0100110011001100 1 1001... Sum
v vvvvvvv
1 | extra bit past the 24
1 "or" of the rest of the bits
+10000110.0100110011001100 1 1 Value prior to rounding
^ ^ ^ ^ These 4 bits & rounding mode determine round value
+ 1 Round value to add (assume round to nearest, ties to even)
+10000110.0100110011001101 Sum
+ 0000110.0100110011001101 23-bit portion explicitly stored.
Amend algorithm to 1) one more bit, the "24th" bit (starting form 0th bit) and 2) the "or" of all the lesser bits (25th, 26th, etc).
From these 2 bits, the least significant bit, sign bit and rounding mode, the proper rounding value can be determined.