Search code examples
cfloating-pointexphyperbolic-function

How can I compute `exp(x)/2` when `x` is large?


In trying to write my sinh(x): double my_sinh(double x), I came across the corner case.

When x is in the 709.782... to 710.475... range, how to calculate exp(x)/2?


Hyperbolic sine sinh(x) and cosine cosh(x) are both well approximated by exp(x)/2 when x >= 20 (or so, depending on the type details of x).

With common double, the largest finite value DBL_MAX is about 1.798...x10308.
The most positive x with a finite sinh(x) and cosh(x) is about 710.475....
The most positive x with a finite exp(x) is about 709.782....

Since exp(710.0) overflows, the following /2 can be too late to achieve a finite result.

The code could use a wider precision/range, like long double, yet long double is not certainly any better than double, and I am looking for a double-only solution.

#include <assert.h>
#include <math.h>
#include <stdio.h>

double my_sinh_f1(double x) {
  assert(x >= 709);  // Only concerned about large values for now.
  return (double) (expl(x)/2);  // For now, call a long double function.
}

/*
 * Over a range of values:
 * Compute the difference between f(x) and higher precision sinhl.
 * Scale difference to the unit-in-the-last-place (ULP) of the double result.
 * Return the worst case ULP.
 */
double test_sinh(double (f)(double), double x0, double x1, unsigned n) {
  double error_max = 0.0;
  double dx = (x1 - x0)/n;
  for (double x = x0; x <= x1; x += dx) {
    long double y0_as_ld = sinhl(x);
    double y0 = (double) y0_as_ld;
    double ulp = y0 > y0_as_ld ? y0 - nextafter(y0,0) : nextafter(y0,INFINITY) - y0;
    double y1 = f(x);
    double error = (double) ((y1 - y0_as_ld)/ulp);
    error = fabs(error);
    if (error > error_max) {
      error_max = error;
    }
  }
  return error_max;
}

int main(void) {
  double (*f)(double) = my_sinh_f1;
  double x0= log(DBL_MAX);
  double x1 = asinh(DBL_MAX);
  unsigned n = 1000000007; //1000003;  // Some prime
  printf("x0:%.3f, x1:%.3f n:%10u error:%g(ulp)\n", x0, x1, n, test_sinh(f, x0, x1, n));
}

The above returns 0.5 for the ULP error, which is not surprising, given the extra wide sinhl().


Solution

  • (Note: I'm assuming IEEE 754 binary64 format and IEEE 754 semantics, including the default round-ties-to-even rounding mode, throughout.)

    Suppose x is a double that lies between 1024 log(2) and 1025 log(2). Then an approximation to exp(x) / 2 can be computed using:

    exp(x - c1 - c2) * 0x1p+1023
    

    where c1 and c2 are the following double constants:

    double c1 = 0x1.62e42fefa39efp+9;
    double c2 = 0x1.acp-46;
    

    The total error in the result should be at most 0.106 ulps on top of whatever error is introduced by the application of exp.

    Explanation

    The values c1 and c2 have been chosen so that c1 + c2 (the mathematical sum, not the floating-point one) is a good approximation to 1024 log(2): c1 is simply the closest double to 1024 log(2), while c2 is the closest multiple of 2^-53 to 1024 log(2) - c1. The sum c1 + c2 exceeds 1024 log(2) by an error e ~= 1.173e-17. (We get lucky with the bits here: this error is a bit smaller than we might reasonably have expected a random error to be.)

    The two floating-point subtractions involved in evaluating (x - c1) - c2 are exact: the first subtraction is exact by Sterbenz's lemma; the second is exact because it's a subtraction between two quantities that are both integer multiples of 2^-53, and the result lies within the range of multiples of 2^-53 that can be represented exactly - namely [-1.0, 1.0].

    The multiplication is a multiplication by a power of two, and so will also be exact (barring possible overflow at the top end of the range; but that overflow can only happen if the exp implementation is out by hundreds of ulps at the top end).

    As a result of the exact operations, there are only two sources of error: the rounding error arising from the exp computation, and the error arising from the fact that we're computing a slightly different quantity from the one we actually want - we're computing a floating-point approximation to e^(x - c1 - c2), when what we actually want is a floating-point approximation to e^(x - 1024 log2).

    The value exp(x - c1 - c2) * 0x1p+1023 differs from the mathematically exact value e(x - c1 - c2) 2^1023 by an amount depending on how good the libm implementation of exp is: for a perfectly correctly-rounded exp implementation, the error would less than 0.5 ulp. In reality, we may be able to hope for something good to 0.51 ulp or so.

    And since c1 + c2 differs from 1024 log(2) by around 1.173e-17, the mathematically exact value e^(x - c1 - c2) 2^1023 differs from the value e^(x - 1024 log(2)) 2^1023 (which is e^x / 2, the value we actually want) by a value that works out to less than 2^1024 1.174e-17, or less than 0.10569 ulps.

    Numerical testing

    Using Python, I ran some numerical tests against a few million randomly-chosen values in the range of interest, using high-precision Decimal to compute the "exact" value to compare with. The worst case I found was for an input of x = 710.4755889083583 (0x1.633ce018ebdeap+9), with a result that had an absolute error of 0.6223059842974604 ulps. In that particular case, it turned out that exp(x - c1 - c2) was being evaluated as 0x1.ffdc766d23b62p+0, an error of around 0.517 ulps.

    Here's the Python code I used for testing:

    # https://stackoverflow.com/q/79446335
    
    from decimal import Decimal, getcontext
    from math import exp
    from random import uniform
    
    # Set a huge Decimal precision, so that deviation from Decimal exp result and
    # the correctly rounded result is neglible.
    getcontext().prec = 200
    
    low = float.fromhex("0x1.62e42fefa39f0p+9")  # smallest value > 1024 log(2)
    high = float.fromhex("0x1.633ce8fb9f87dp+9")  # largest value < 1025 log(2)
    
    c1 = float.fromhex("0x1.62e42fefa39efp+9")  # 1024 * log(2) < c1 + c2 < low
    c2 = float.fromhex("0x1.acp-46")
    c3 = float.fromhex("0x1p+1023")
    
    def metd_sinh(x: float) -> float:
        """Approximation to exp(x) / 2."""
        return exp(x - c1 - c2) * c3
    
    def sinh_accurate(x: float) -> Decimal:
        """Accurate Decimal approximation to exp(x) / 2."""
        return Decimal(x).exp() / 2
    
    def metd_sinh_error(x: float) -> float:
        """Approximate absolute error in the metd_sinh approximation."""
        return float(Decimal(metd_sinh(x)) - sinh_accurate(x))
    
    def exp_error(x: float) -> float:
        """Portion of the error coming from the libm exp implementation."""
        return float(Decimal(metd_sinh(x)) - Decimal(x - c1 - c2).exp() * 2**1023)
    
    
    # 1 ulp for values in the binade [2^1023, 2^1024).
    one_ulp = float.fromhex("0x1p+971")
    
    worst_so_far = 0
    for i in range(10**9):
        if i % 10**6 == 0:
            print("Trials: ", i)
        x = uniform(low, high)
        err_in_ulps = metd_sinh_error(x) / one_ulp
        abs_err = abs(err_in_ulps)
        if abs_err > worst_so_far:
            worst_so_far = abs_err
            from_exp = exp_error(x) / one_ulp
            print(f"Error for x={x.hex()[2:]}: {err_in_ulps:.5f} ulps (from exp: {from_exp:.5f} ulps)")
    

    Results from the first 5 million trials of a fresh run:

    Trials:  0
    Error for x=1.632d564dc418dp+9: -0.06259 ulps (from exp: 0.03099 ulps)
    Error for x=1.62e6ec2772555p+9: 0.38075 ulps (from exp: 0.43474 ulps)
    Error for x=1.6315175f9016bp+9: 0.39492 ulps (from exp: 0.47235 ulps)
    Error for x=1.62fff72d59fe6p+9: -0.46066 ulps (from exp: -0.39501 ulps)
    Error for x=1.62e8c92173e14p+9: -0.53450 ulps (from exp: -0.47972 ulps)
    Error for x=1.632ef072907bap+9: -0.58970 ulps (from exp: -0.49495 ulps)
    Error for x=1.6338605a98b4dp+9: -0.60349 ulps (from exp: -0.50148 ulps)
    Error for x=1.633b830ebb408p+9: -0.60918 ulps (from exp: -0.50464 ulps)
    Error for x=1.633cbcac70bcfp+9: -0.61047 ulps (from exp: -0.50493 ulps)
    Error for x=1.633bfcc25bc09p+9: -0.61100 ulps (from exp: -0.50607 ulps)
    Trials:  1000000
    Error for x=1.633c08528d339p+9: -0.61384 ulps (from exp: -0.50888 ulps)
    Error for x=1.633b611475819p+9: -0.61458 ulps (from exp: -0.51015 ulps)
    Trials:  2000000
    Trials:  3000000
    Error for x=1.633b71a4ae702p+9: -0.61593 ulps (from exp: -0.51145 ulps)
    Trials:  4000000
    Trials:  5000000
    

    Improvements!

    In comments, @chux suggested two modifications to the above, which together result in an improvement in worst-case error (to less than 0.6 ulp in the case of a perfectly correctly-rounded exp implementation). Those modifications are:

    • choose c1 and c2 to give a double-double approximation to 1025 log(2) instead of 1024 log(2)
    • don't insist on the second subtraction in x - c1 - c2 being exact: instead, allow c2 to take full precision (defining it as the closest double to 1025 log(2) - c1, where c1 is the closest double to 1025 log(2))

    With those modifications, the worst-case error is bounded by 0.0910707 ulps on top of whatever is introduced by exp.

    The key to the analysis is to note that while x - c1 - c2 is no longer computed exactly in floating-point, the difference between the computed floating-point value x - c1 - c2 and the real number x - 1025 log(2) depends only on the binade that x - 1025 log(2) lives in: the rounding error from the subtraction is essentially given by the trailing bits of c2. (And we also have to account for the difference between 1025 log(2) and c1 + c2, which is constant but tiny.)

    (Note: the below isn't perfectly rigorous, but I believe it could be made so.)

    And now we can go case by case:

    • if x - 1025 log(2) < -0.5 then x - c1 - c2 lands in the binade (-1.0, -0.5). The rounding error in the second subtraction is exactly 0x1.a6b1642750000p-57 and the worst-case error is around exp(-0.5) times this value, which works out to around 0.0626 ulps
    • if -0.5 < x - 1025 log(2) < -0.25, then x - c1 - c2 is in (-0.5, -0.25) and the rounding error in the second subtraction is the same as in the previous case: 0x1.a6b1642750000p-57. The worst-case error comes from multiplying this by exp(-0.25) to get a bound of 0.0804 ulps.
    • if -0.25 < x - 1025 log(2) < -0.125, the rounding error is yet again the same, and the worst-case error is now just very slightly more than exp(-0.125) * 0x1.a6b1642750000p-57, and is bounded by 0.091070607 ulps. This is our worst case.
    • if -0.125 < x - 1025 log(2), the rounding error in the second subtraction is now at most -0x1.653a6f62c0000p-59 and the resulting error in our approximation is at most 0.0219 ulps. (Making c2 precise is key here.)

    I've verified the above four cases by numerical experimentation.