c++c floating-point floating-accuracy floating-point-precision

Converting a long double to double with upward (or downward) rounding

Assume that we are working on a platform where the type long double has a strictly greater precision than 64 bits. What is the fastest way to convert a given long double to an ordinary double-precision number with some prescribed rounding (upward, downward, round-to-nearest, etc.)?

To make the question more precise, let me give a concrete example: Let N be a given long double, and M be the double whose value minimizes the (real) value M - N such that M > N. This M would be the upward rounded conversion I want to find.

Can I get away with setting the rounding mode of the FP environment appropriately and performing a simple cast (e.g. (double) N)?

Clarification: You can assume that the platform supports the IEEE Standard for Floating-Point Arithmetic (IEEE 754).

Solution

Can I get away with setting the rounding mode of the FP environment appropriately and performing a simple cast (e.g. (double) N)?

Yes, as long as the compiler implements the IEEE 754 (most of them do at least roughly). Conversion from one floating-point format to the other is one of the operations to which, according to IEEE 754, the rounding mode should apply. In order to convert from long double to double up, set the rounding mode to upwards and do the conversion.

In C99, which should be accepted by C++ compilers (I'm not sure a syntax is specified for C++):

#include <fenv.h>
#pragma STDC FENV_ACCESS ON
…
fesetround(FE_UPWARD);
double d = (double) ld;

PS: you may discover that your compiler does not implement #pragma STDC FENV_ACCESS ON properly. Welcome to the club.