Why don't the precision of floating-point data types grow proportional to its size? E.g.:
std::cout << sizeof(float) << "\n"; // this gives 4 on my machine "debian 64 bit" with "gcc 6.3.0"
std::cout << std::numeric_limits<float>::digits10 << "\n"; // gives 6
std::cout << sizeof(double) << "\n"; // gives 8
std::cout << std::numeric_limits<double>::digits10 << "\n"; // gives 15
std::cout << sizeof(long double) << "\n"; // gives 16
std::cout << std::numeric_limits<long double>::digits10 << "\n"; // gives 18
As you can see the precision of double
is about double as precision of float
, and that makes sense as the size of double
is double as size of float
.
But this is not the same case between double
and long double
, the size of long double
is 128-bit which is twice as that of 64-bit double
, but its precision is only three digits more!!
I have no idea how floating-point numbers are implemented, but from a rational standpoint does it even make sense to use 64 bits more of memory for only three digits of precision?!
I searched around but was not able to find a simple, straightforward answer.
If someone could explain why the precision of long double
only three digits more than double
, can you also explain why this is not the same case as between double
and float
?
And I also want to know how can I get better precision, without defining my own data type which obviously going to be at expense of performance?
"Precision" is not all that is to a floating point value. It's also about "magnitude" (not sure if that term is correct though!): How big (or small) can the represented values become?
For that, try printing also the max_exponent
of each type:
std::cout << "float: " << sizeof(float) << "\n";
std::cout << std::numeric_limits<float>::digits << "\n";
std::cout << std::numeric_limits<float>::max_exponent << "\n";
std::cout << "double: " << sizeof(double) << "\n";
std::cout << std::numeric_limits<double>::digits << "\n";
std::cout << std::numeric_limits<double>::max_exponent << "\n";
std::cout << "long double: " << sizeof(long double) << "\n";
std::cout << std::numeric_limits<long double>::digits << "\n";
std::cout << std::numeric_limits<long double>::max_exponent << "\n";
Output on ideone:
float: 4
24
128
double: 8
53
1024
long double: 16
64
16384
So the extra bits are not all used to represent more digits (precision) but allow the exponent to be larger. Using the wording from IEE 754 long double
mostly increases the exponent range rather than the precision.
The format which is shown by my ideone sample above shows (probably) the "x86 extended precision format" which assigns 1 bit for the integer part, 63 bits for the fraction part (together 64 digits) and 15 bits (2^(15-1) = 16384, 1 bit used for the sign of the exponent) for the exponent.
Note that the C++ standard only requires long double
to be at least as precise as double
, so long double
could be either a synonym to double
, the shown x86 extended precision format (most likely) or better (AFAIK only GCC on PowerPC).
And I also want to know how can I get better precision, without defining my own data type which obviously going to be at expense of performance?
You need to either write it on your own (surely a learning experience, best not to do for production code) or use a library, like GNU MPFR or Boost.Multiprecision.