Search code examples
cmathfloating-pointfloating-accuracy

Accurate computation of PDF of standard normal distribution using C standard math library


The probability density function of the standard normal distribution is defined as e-x2/2 / √(2π). This can be rendered in straightforward manner into C code. A sample single-precision implementation might be:

float my_normpdff (float a)
{
    return 0x1.988454p-2f * my_expf (-0.5f * a * a); /* 1/sqrt(2*pi) */
}

While this code is free from premature underflow, there is an accuracy issue since the error incurred in the computation of a2/2 is magnified by the subsequent exponentiation. One can easily demonstrate this with tests against higher-precision references. The exact error will differ based on the accuracy of the exp() or expf() implementations used; for faithfully rounded exponentiation functions one would typically observe a maximum error of around 26 ulps for IEEE-754 binary32 single precision, around 29 ulps for IEEE-754 binary64 double precision.

How can the accuracy issue be addressed in a reasonably efficient manner? A trivial approach would be to employ higher-precision intermediate computation, for example use double computation for the float implementation. But this approach does not work for a double implementation if floating-point arithmetic of higher precision is not easily available, and may be inefficient for float implementation if double arithmetic is significantly more expensive than float computation, e.g. on many GPUs.


Solution

  • The accuracy issue raised in the question can effectively, and efficiently, be addressed by the use of limited amounts of double-float or double-double computation, facilitated by the use of the fused multiply-add (FMA) operation.

    This operation is available since C99 via the standard math functions fmaf(a,b,c) and fma(a,b,c) which compute a*b+c, without rounding of the intermediate product. While the functions map directly to fast hardware operations on almost all modern processors, they may use emulation code on older platforms, in which case they may be be very slow.

    This allows the computation of the product with twice the normal precision using just two operations, resulting in a head:tail pair of native-precision numbers:

    prod_hi = a * b            // head
    prod_lo = FMA (a, b, -hi)  // tail
    

    The high-order bits of the result can be passed to the exponentiation, while the low-order bits are used for improving the accuracy of the result via linear interpolation, taking advantage of the fact that ex is its own derivative:

    e = exp (prod_hi) + exp (prod_hi) * prod_lo  // exp (a*b)
    

    This allows us to eliminate most of the error of the naive implementation. The other, minor, source of computation error is the limited precision with which the constant 1/√(2π) is represented. This can be improved by using a head:tail representation for the constant that provides twice the native precision, and computing:

    r = FMA (const_hi, x, const_lo * x)  // const * x
    

    The following paper points out that this technique can even result in correctly-rounded multiplication for some arbitrary-precision constants:

    Nicolas Brisebarre and Jean-Michel Muller, "Correctly rounded multiplication by arbitrary precision constants", IEEE Transactions on Computers, Vol. 57, No. 2, February 2008, pp. 165-174

    Combining the two techniques, and taking care of a few corner cases involving NaNs, we arrive at the following float implementation based on IEEE-754 binary32:

    float my_normpdff (float a)
    {
        const float RCP_SQRT_2PI_HI =  0x1.988454p-02f; /* 1/sqrt(2*pi), msbs */
        const float RCP_SQRT_2PI_LO = -0x1.857936p-27f; /* 1/sqrt(2*pi), lsbs */
        float ah, sh, sl, ea;
    
        ah = -0.5f * a;
        sh = a * ah;
        sl = fmaf (a, ah, 0.5f * a * a); /* don't flip "sign bit" of NaN argument */
        ea = expf (sh);
        if (ea != 0.0f) ea = fmaf (sl, ea, ea); /* avoid creation of NaN */
        return fmaf (RCP_SQRT_2PI_HI, ea, RCP_SQRT_2PI_LO * ea);
    }
    

    The corresponding double implementation, based on IEEE-754 binary64, looks almost identical, except for the different constant values used:

    double my_normpdf (double a)
    {
        const double RCP_SQRT_2PI_HI =  0x1.9884533d436510p-02; /* 1/sqrt(2*pi), msbs */
        const double RCP_SQRT_2PI_LO = -0x1.cbc0d30ebfd150p-56; /* 1/sqrt(2*pi), lsbs */
        double ah, sh, sl, ea;
    
        ah = -0.5 * a;
        sh = a * ah;
        sl = fma (a, ah, 0.5 * a * a); /* don't flip "sign bit" of NaN argument */
        ea = exp (sh);
        if (ea != 0.0) ea = fma (sl, ea, ea); /* avoid creation of NaN */
        return fma (RCP_SQRT_2PI_HI, ea, RCP_SQRT_2PI_LO * ea);
    }
    

    The accuracy of these implementations depends on the accuracy of the standard math functions expf() and exp(), respectively. Where the C math library provides faithfully-rounded versions of those, the maximum error of either of the two implementations above is typically less than 2.5 ulps.