Search code examples
c++gccfloating-pointprecisionlong-double

Precision of floating-point data types in C++


Why don't the precision of floating-point data types grow proportional to its size? E.g.:

std::cout << sizeof(float) << "\n";  // this gives 4 on my machine "debian 64 bit" with "gcc 6.3.0"  
std::cout << std::numeric_limits<float>::digits10  << "\n"; // gives 6

std::cout << sizeof(double) << "\n";  // gives 8
std::cout << std::numeric_limits<double>::digits10 <<  "\n"; // gives 15

std::cout << sizeof(long double) <<  "\n";  // gives 16
std::cout << std::numeric_limits<long double>::digits10  << "\n"; // gives 18

As you can see the precision of double is about double as precision of float, and that makes sense as the size of double is double as size of float.

But this is not the same case between double and long double, the size of long double is 128-bit which is twice as that of 64-bit double, but its precision is only three digits more!!

I have no idea how floating-point numbers are implemented, but from a rational standpoint does it even make sense to use 64 bits more of memory for only three digits of precision?!

I searched around but was not able to find a simple, straightforward answer. If someone could explain why the precision of long double only three digits more than double, can you also explain why this is not the same case as between double and float?

And I also want to know how can I get better precision, without defining my own data type which obviously going to be at expense of performance?


Solution

  • "Precision" is not all that is to a floating point value. It's also about "magnitude" (not sure if that term is correct though!): How big (or small) can the represented values become?

    For that, try printing also the max_exponent of each type:

    std::cout << "float: " << sizeof(float) << "\n";
    std::cout << std::numeric_limits<float>::digits << "\n";
    std::cout << std::numeric_limits<float>::max_exponent << "\n";
    
    std::cout << "double: " << sizeof(double) << "\n";
    std::cout << std::numeric_limits<double>::digits << "\n";
    std::cout << std::numeric_limits<double>::max_exponent << "\n";
    
    std::cout << "long double: " <<  sizeof(long double) << "\n";
    std::cout << std::numeric_limits<long double>::digits << "\n";
    std::cout << std::numeric_limits<long double>::max_exponent << "\n";
    

    Output on ideone:

    float: 4
    24
    128
    double: 8
    53
    1024
    long double: 16
    64
    16384
    

    So the extra bits are not all used to represent more digits (precision) but allow the exponent to be larger. Using the wording from IEE 754 long double mostly increases the exponent range rather than the precision.

    The format which is shown by my ideone sample above shows (probably) the "x86 extended precision format" which assigns 1 bit for the integer part, 63 bits for the fraction part (together 64 digits) and 15 bits (2^(15-1) = 16384, 1 bit used for the sign of the exponent) for the exponent.

    Note that the C++ standard only requires long double to be at least as precise as double, so long double could be either a synonym to double, the shown x86 extended precision format (most likely) or better (AFAIK only GCC on PowerPC).

    And I also want to know how can I get better precision, without defining my own data type which obviously going to be at expense of performance?

    You need to either write it on your own (surely a learning experience, best not to do for production code) or use a library, like GNU MPFR or Boost.Multiprecision.