In trying to write my sinh(x)
: double my_sinh(double x)
, I came across the corner case.
When
x
is in the 709.782... to 710.475... range, how to calculateexp(x)/2
?
Hyperbolic sine sinh(x)
and cosine cosh(x)
are both well approximated by exp(x)/2
when x >= 20
(or so, depending on the type details of x
).
With common double, the largest finite value DBL_MAX
is about 1.798...x10308.
The most positive x
with a finite sinh(x)
and cosh(x)
is about 710.475....
The most positive x
with a finite exp(x)
is about 709.782....
Since exp(710.0)
overflows, the following /2
can be too late to achieve a finite result.
The code could use a wider precision/range, like long double
, yet long double
is not certainly any better than double
, and I am looking for a double
-only solution.
#include <assert.h>
#include <math.h>
#include <stdio.h>
double my_sinh_f1(double x) {
assert(x >= 709); // Only concerned about large values for now.
return (double) (expl(x)/2); // For now, call a long double function.
}
/*
* Over a range of values:
* Compute the difference between f(x) and higher precision sinhl.
* Scale difference to the unit-in-the-last-place (ULP) of the double result.
* Return the worst case ULP.
*/
double test_sinh(double (f)(double), double x0, double x1, unsigned n) {
double error_max = 0.0;
double dx = (x1 - x0)/n;
for (double x = x0; x <= x1; x += dx) {
long double y0_as_ld = sinhl(x);
double y0 = (double) y0_as_ld;
double ulp = y0 > y0_as_ld ? y0 - nextafter(y0,0) : nextafter(y0,INFINITY) - y0;
double y1 = f(x);
double error = (double) ((y1 - y0_as_ld)/ulp);
error = fabs(error);
if (error > error_max) {
error_max = error;
}
}
return error_max;
}
int main(void) {
double (*f)(double) = my_sinh_f1;
double x0= log(DBL_MAX);
double x1 = asinh(DBL_MAX);
unsigned n = 1000000007; //1000003; // Some prime
printf("x0:%.3f, x1:%.3f n:%10u error:%g(ulp)\n", x0, x1, n, test_sinh(f, x0, x1, n));
}
The above returns 0.5 for the ULP error, which is not surprising, given the extra wide sinhl()
.
(Note: I'm assuming IEEE 754 binary64 format and IEEE 754 semantics, including the default round-ties-to-even rounding mode, throughout.)
Suppose x
is a double
that lies between 1024 log(2)
and 1025 log(2)
. Then an approximation to exp(x) / 2
can be computed using:
exp(x - c1 - c2) * 0x1p+1023
where c1
and c2
are the following double
constants:
double c1 = 0x1.62e42fefa39efp+9;
double c2 = 0x1.acp-46;
The total error in the result should be at most 0.106 ulps on top of whatever error is introduced by the application of exp
.
The values c1
and c2
have been chosen so that c1 + c2
(the mathematical sum, not the floating-point one) is a good approximation to 1024 log(2)
: c1
is simply the closest double
to 1024 log(2)
, while c2
is the closest multiple of 2^-53
to 1024 log(2) - c1
. The sum c1 + c2
exceeds 1024 log(2)
by an error e ~= 1.173e-17
. (We get lucky with the bits here: this error is a bit smaller than we might reasonably have expected a random error to be.)
The two floating-point subtractions involved in evaluating (x - c1) - c2
are exact: the first subtraction is exact by Sterbenz's lemma; the second is exact because it's a subtraction between two quantities that are both integer multiples of 2^-53
, and the result lies within the range of multiples of 2^-53
that can be represented exactly - namely [-1.0, 1.0]
.
The multiplication is a multiplication by a power of two, and so will also be exact (barring possible overflow at the top end of the range; but that overflow can only happen if the exp
implementation is out by hundreds of ulps at the top end).
As a result of the exact operations, there are only two sources of error: the rounding error arising from the exp
computation, and the error arising from the fact that we're computing a slightly different quantity from the one we actually want - we're computing a floating-point approximation to e^(x - c1 - c2)
, when what we actually want is a floating-point approximation to e^(x - 1024 log2)
.
The value exp(x - c1 - c2) * 0x1p+1023
differs from the mathematically exact value e(x - c1 - c2) 2^1023
by an amount depending on how good the libm implementation of exp
is: for a perfectly correctly-rounded exp
implementation, the error would less than 0.5 ulp
. In reality, we may be able to hope for something good to 0.51 ulp
or so.
And since c1 + c2
differs from 1024 log(2)
by around 1.173e-17
, the mathematically exact value e^(x - c1 - c2) 2^1023
differs from the value e^(x - 1024 log(2)) 2^1023
(which is e^x / 2
, the value we actually want) by a value that works out to less than 2^1024 1.174e-17
, or less than 0.10569
ulps.
Using Python, I ran some numerical tests against a few million randomly-chosen values in the range of interest, using high-precision Decimal
to compute the "exact" value to compare with. The worst case I found was for an input of x = 710.4755889083583
(0x1.633ce018ebdeap+9
), with a result that had an absolute error of 0.6223059842974604
ulps. In that particular case, it turned out that exp(x - c1 - c2)
was being evaluated as 0x1.ffdc766d23b62p+0
, an error of around 0.517 ulps.
Here's the Python code I used for testing:
# https://stackoverflow.com/q/79446335
from decimal import Decimal, getcontext
from math import exp
from random import uniform
# Set a huge Decimal precision, so that deviation from Decimal exp result and
# the correctly rounded result is neglible.
getcontext().prec = 200
low = float.fromhex("0x1.62e42fefa39f0p+9") # smallest value > 1024 log(2)
high = float.fromhex("0x1.633ce8fb9f87dp+9") # largest value < 1025 log(2)
c1 = float.fromhex("0x1.62e42fefa39efp+9") # 1024 * log(2) < c1 + c2 < low
c2 = float.fromhex("0x1.acp-46")
c3 = float.fromhex("0x1p+1023")
def metd_sinh(x: float) -> float:
"""Approximation to exp(x) / 2."""
return exp(x - c1 - c2) * c3
def sinh_accurate(x: float) -> Decimal:
"""Accurate Decimal approximation to exp(x) / 2."""
return Decimal(x).exp() / 2
def metd_sinh_error(x: float) -> float:
"""Approximate absolute error in the metd_sinh approximation."""
return float(Decimal(metd_sinh(x)) - sinh_accurate(x))
def exp_error(x: float) -> float:
"""Portion of the error coming from the libm exp implementation."""
return float(Decimal(metd_sinh(x)) - Decimal(x - c1 - c2).exp() * 2**1023)
# 1 ulp for values in the binade [2^1023, 2^1024).
one_ulp = float.fromhex("0x1p+971")
worst_so_far = 0
for i in range(10**9):
if i % 10**6 == 0:
print("Trials: ", i)
x = uniform(low, high)
err_in_ulps = metd_sinh_error(x) / one_ulp
abs_err = abs(err_in_ulps)
if abs_err > worst_so_far:
worst_so_far = abs_err
from_exp = exp_error(x) / one_ulp
print(f"Error for x={x.hex()[2:]}: {err_in_ulps:.5f} ulps (from exp: {from_exp:.5f} ulps)")
Results from the first 5 million trials of a fresh run:
Trials: 0
Error for x=1.632d564dc418dp+9: -0.06259 ulps (from exp: 0.03099 ulps)
Error for x=1.62e6ec2772555p+9: 0.38075 ulps (from exp: 0.43474 ulps)
Error for x=1.6315175f9016bp+9: 0.39492 ulps (from exp: 0.47235 ulps)
Error for x=1.62fff72d59fe6p+9: -0.46066 ulps (from exp: -0.39501 ulps)
Error for x=1.62e8c92173e14p+9: -0.53450 ulps (from exp: -0.47972 ulps)
Error for x=1.632ef072907bap+9: -0.58970 ulps (from exp: -0.49495 ulps)
Error for x=1.6338605a98b4dp+9: -0.60349 ulps (from exp: -0.50148 ulps)
Error for x=1.633b830ebb408p+9: -0.60918 ulps (from exp: -0.50464 ulps)
Error for x=1.633cbcac70bcfp+9: -0.61047 ulps (from exp: -0.50493 ulps)
Error for x=1.633bfcc25bc09p+9: -0.61100 ulps (from exp: -0.50607 ulps)
Trials: 1000000
Error for x=1.633c08528d339p+9: -0.61384 ulps (from exp: -0.50888 ulps)
Error for x=1.633b611475819p+9: -0.61458 ulps (from exp: -0.51015 ulps)
Trials: 2000000
Trials: 3000000
Error for x=1.633b71a4ae702p+9: -0.61593 ulps (from exp: -0.51145 ulps)
Trials: 4000000
Trials: 5000000
In comments, @chux suggested two modifications to the above, which together result in an improvement in worst-case error (to less than 0.6 ulp in the case of a perfectly correctly-rounded exp
implementation). Those modifications are:
c1
and c2
to give a double-double approximation to 1025 log(2)
instead of 1024 log(2)
x - c1 - c2
being exact: instead, allow c2
to take full precision (defining it as the closest double
to 1025 log(2) - c1
, where c1
is the closest double to 1025 log(2)
)With those modifications, the worst-case error is bounded by 0.0910707
ulps on top of whatever is introduced by exp
.
The key to the analysis is to note that while x - c1 - c2
is no longer computed exactly in floating-point, the difference between the computed floating-point value x - c1 - c2
and the real number x - 1025 log(2)
depends only on the binade that x - 1025 log(2)
lives in: the rounding error from the subtraction is essentially given by the trailing bits of c2
. (And we also have to account for the difference between 1025 log(2)
and c1 + c2
, which is constant but tiny.)
(Note: the below isn't perfectly rigorous, but I believe it could be made so.)
And now we can go case by case:
x - 1025 log(2) < -0.5
then x - c1 - c2
lands in the binade (-1.0, -0.5)
. The rounding error in the second subtraction is exactly 0x1.a6b1642750000p-57
and the worst-case error is around exp(-0.5)
times this value, which works out to around 0.0626
ulps-0.5 < x - 1025 log(2) < -0.25
, then x - c1 - c2
is in (-0.5, -0.25)
and the rounding error in the second subtraction is the same as in the previous case: 0x1.a6b1642750000p-57
. The worst-case error comes from multiplying this by exp(-0.25)
to get a bound of 0.0804
ulps.-0.25 < x - 1025 log(2) < -0.125
, the rounding error is yet again the same, and the worst-case error is now just very slightly more than exp(-0.125) * 0x1.a6b1642750000p-57
, and is bounded by 0.091070607
ulps. This is our worst case.-0.125 < x - 1025 log(2)
, the rounding error in the second subtraction is now at most -0x1.653a6f62c0000p-59
and the resulting error in our approximation is at most 0.0219
ulps. (Making c2
precise is key here.)I've verified the above four cases by numerical experimentation.