I am working on a program that needs to convert a 32-bit number into a decimal number.
The number that I get from input is a 32 bit number represented as floating point. The first bit is the sign, the next 8 bits are the exponent, and the other 23 bits are mantissa. I am working the program in C. In input, I get that number as a char[]
array, and after that I am making a new int[]
array where I store the sign , the exponent and the mantissa. But, I have problem with the mantissa when I am trying to store it in some datatype, because I need to use the mantissa as a number, not as an array: formula=sign*(1+0.mantissa)*2^(exponent-127)
.
Here is the code I use to store the mantissa, but still the program gets me wrong results:
double oMantissa=0;
int counter=0;
for(counter=0;counter<23;counter++)
{
if(mantissa[counter]==1)
{
oMantissa+=mantissa[counter]*pow(10,-counter);
}
}
mantissa[]
is an int
array where I have already converted the mantissa from a char
array. When I get the value from formula
, it has to be a binary number, and I have to convert it to decimal, so I will get the value of the number. Can you help me with storing the 23 bits of the mantissa? And, I mustn't use functions like strtoul
that convert the 32-bit number directly into binary. I have to use formula
.
Which part of the below code was hard to get right given all the formulas and sample numbers and a calculator?
#include <stdio.h>
#include <limits.h>
#if UINT_MAX >= 0xFFFFFFFF
typedef unsigned uint32;
#else
typedef unsigned long uint32;
#endif
#define C_ASSERT(expr) extern char CAssertExtern[(expr)?1:-1]
// Ensure uint32 is exactly 32-bit
C_ASSERT(sizeof(uint32) * CHAR_BIT == 32);
// Ensure float has the same number of bits as uint32, 32
C_ASSERT(sizeof(uint32) == sizeof(float));
double Ieee754SingleDigits2DoubleCheat(const char s[32])
{
uint32 v;
float f;
unsigned i;
char *p1 = (char*)&v, *p2 = (char*)&f;
// Collect binary digits into an integer variable
v = 0;
for (i = 0; i < 32; i++)
v = (v << 1) + (s[i] - '0');
// Copy the bits from the integer variable to a float variable
for (i = 0; i < sizeof(f); i++)
*p2++ = *p1++;
return f;
}
double Ieee754SingleDigits2DoubleNoCheat(const char s[32])
{
double f;
int sign, exp;
uint32 mant;
int i;
// Do you really need strto*() here?
sign = s[0] - '0';
// Do you really need strto*() or pow() here?
exp = 0;
for (i = 1; i <= 8; i++)
exp = exp * 2 + (s[i] - '0');
// Remove the exponent bias
exp -= 127;
// Should really check for +/-Infinity and NaNs here
if (exp > -127)
{
// Normal(ized) numbers
mant = 1; // The implicit "1."
// Account for "1." being in bit position 23 instead of bit position 0
exp -= 23;
}
else
{
// Subnormal numbers
mant = 0; // No implicit "1."
exp = -126; // See your IEEE-54 formulas
// Account for ".1" being in bit position 22 instead of bit position -1
exp -= 23;
}
// Or do you really need strto*() or pow() here?
for (i = 9; i <= 31; i++)
mant = mant * 2 + (s[i] - '0');
f = mant;
// Do you really need pow() here?
while (exp > 0)
f *= 2, exp--;
// Or here?
while (exp < 0)
f /= 2, exp++;
if (sign)
f = -f;
return f;
}
int main(void)
{
printf("%+g\n", Ieee754SingleDigits2DoubleCheat("110000101100010010000000000000000"));
printf("%+g\n", Ieee754SingleDigits2DoubleNoCheat("010000101100010010000000000000000"));
printf("%+g\n", Ieee754SingleDigits2DoubleCheat("000000000100000000000000000000000"));
printf("%+g\n", Ieee754SingleDigits2DoubleNoCheat("100000000100000000000000000000000"));
printf("%+g\n", Ieee754SingleDigits2DoubleCheat("000000000000000000000000000000000"));
printf("%+g\n", Ieee754SingleDigits2DoubleNoCheat("000000000000000000000000000000000"));
return 0;
}
Output (ideone):
-98.25
+98.25
+5.87747e-39
-5.87747e-39
+0
+0