Assume that we are working on a platform where the type long double
has a strictly greater precision than 64 bits. What is the fastest way to convert a given long double
to an ordinary double-precision number with some prescribed rounding (upward, downward, round-to-nearest, etc.)?
To make the question more precise, let me give a concrete example: Let N
be a given long double
, and M
be the double
whose value minimizes the (real) value M - N
such that M
> N
. This M
would be the upward rounded conversion I want to find.
Can I get away with setting the rounding mode of the FP environment appropriately and performing a simple cast (e.g. (double) N
)?
Clarification: You can assume that the platform supports the IEEE Standard for Floating-Point Arithmetic (IEEE 754).
Can I get away with setting the rounding mode of the FP environment appropriately and performing a simple cast (e.g. (double) N)?
Yes, as long as the compiler implements the IEEE 754 (most of them do at least roughly). Conversion from one floating-point format to the other is one of the operations to which, according to IEEE 754, the rounding mode should apply. In order to convert from long double
to double
up, set the rounding mode to upwards and do the conversion.
In C99, which should be accepted by C++ compilers (I'm not sure a syntax is specified for C++):
#include <fenv.h>
#pragma STDC FENV_ACCESS ON
…
fesetround(FE_UPWARD);
double d = (double) ld;
PS: you may discover that your compiler does not implement #pragma STDC FENV_ACCESS ON
properly. Welcome to the club.