calgorithmfloating-pointfloating-accuracy# Fast implementation of complementary error function with 5-digit accuracy

[While this is a self-answered question, I will happily upvote and accept any alternative answer that either provides superior accuracy for the same amount of computational work or reduces the amount of computational work while maintaining the same level of accuracy.]

I have previously demonstrated how to compute the complementary error function `erfcf()`

with a maximum error of less than three ulps. This can serve as a building block for additional functions, such as the CDF of the standard normal distribution Φ(x) = ½ erfc (-√½ x) or the Gaussian Q-function, Q(x) = 1-Φ(x) = ½ erfc (√½ x). However, for some use cases computation fully accurate to single precision is not needed while the contribution of `erfc()`

evaluations to total run-time is not negligible.

The literature provides various low-accuracy approximations to the complementary error function, but they are either limited to a subset of the full input domain, or optimized for absolute error, or computationally too complex, for example by requiring multiple invocations of transcendental functions. How could one go about implementing `erfcf()`

with high performance and with a relative accuracy of about 5 decimal digits *across the entire input domain*?

Solution

How could one go about implementing

`erfcf()`

with high performance and with a relative accuracy of about 5 decimal digits across the entire input domain?

This answer, like OP's answer, does achieve *about* 5 decimal digits relative accuracy^{*1}. To achieve *at least* 5 likely obliges another term in the rational polynomial - which I doubt will slow the performance much.

**Special values**

In addition to performance and accuracy, the result of a `my_erfc(x)`

for select values should be considered.

For `x`

near 0.0, the result should be 1.0 exactly. This follows principle of least surprise.

Example: A user may readily rely on `my_cos(0.0) == 1.0`

, or `my_exp(0.0) == 1.0`

, so should `my_erfc(0.0) == 1.0`

Fortunately this is not hard to accomplish when a rational polynomial is used with coefficients of 1.0. See below.

**Take advantage of the linear region**

About 45% of all `float`

are such that the `erfc()`

is a simple linear equation with high accuracy.

```
#define TWO_OVER_ROOT_PI 1.1283791670955125738961589031215f
#define ERFC_SMALL 0.0053854f
float fast_erfcf_alt(float x) {
if (fabsf(x) < ERFC_SMALL) {
return 1.0f - TWO_OVER_ROOT_PI * x;
}
...
}
```

**Define performance rating**

One could rate code in a worst case, what is the slowest for all `float x`

?

I suspect OP is more interested in an assessment from `x= 0.0`

up to when `erfc(x)`

is 0.0 (~`x == 10`

) in a *linear* progression.

Sample performance results and test code follows. Of course, hard to beat the C library's crafted code.

```
0 time: 0.594 erfcf (Library function)
1 time: 0.703 fast_erfcf (OP's answer)
2 time: 0.688 fast_erfcf_alt (This answer)
```

```
int main(void) {
float (*f[3])(float) = {erfcf, fast_erfcf, fast_erfcf_alt };
for (int i = 0; i < 3; i++) {
clock_t c0, c1;
c0 = clock();
float dx = 10.0f - nextafterf(10, 0);
for (float x = 10.0f; x >= 0.0f; x -= dx) {
f[i](x);
}
c1 = clock();
printf("%d time: %g\n", i, (double) (c1 - c0) / CLOCKS_PER_SEC);
}
}
```

Following code is modeled after OP self answer and incorporates ideas mentioned above.

```
#define TWO_OVER_ROOT_PI 1.1283791670955125738961589031215f
#define ERFC_SMALL 0.0053854f
float fast_erfcf_alt(float x) {
if (fabsf(x) < ERFC_SMALL) {
return 1.0f - TWO_OVER_ROOT_PI * x;
}
float a, c, e, p, q, r, s;
a = fabsf(x);
c = fminf(a, 10.5f);
s = -c * c;
#if USE_BUILTIN_EXP
e = expf(s);
#else // USE_BUILTIN_EXP
e = my_expf (s);
#endif // USE_BUILTIN_EXP
q = 0.374177223624056f;
p = -5.00032254520701E-05f;
q = q * c + 1.29051354328887f;
p = p * c + 0.212358010453875f;
q = q * c + 1.84437448399707f;
p = p * c + 0.715675302663111f;
q = q * c + 1.0f;
p = p * c + 1.0f;
r = e / q;
r = r * p;
if (x < 0.0f)
r = 2.0f - r;
if (isnan(x))
r = x + x;
return r;
}
```

Note: I'd consider eliminating `c = fminf(a, 10.5f);`

.

This code's error results using OP's answer test harness.

(`#define USE_FMA (0)`

and `#define USE_BUILTIN_EXP (1))`

.

It's relative error (1.20e-05) vs. OP's error (1.06e-05) is primarily due to using `1.0`

in both polynomials `p`

and `q`

.

Other code could use OP's answer's `p`

and `q`

and this answer's `if (fabs(x) < ERFC_SMALL)`

for a best of both. Additional work needed to see how well the linear and rational polynomial sections meet.

```
* This answer
* max rel err = 1.203212e-05 @ 9.19353199e+00
* max abs err = 9.690295e-06 @ -7.19872937e-02
* max ulp err = 1.877379e+02 @ 8.03564072e+00
*
* OP's answer
* max rel err = 1.061202e-05 @ 4.35268253e-01
* max abs err = 9.398669e-06 @ -9.19871107e-02
* max ulp err = 1.755724e+02 @ 2.46583080e+00
```

As an extra, below is a graphical view of OP's error over the [0... ~10] range. I found it illuminating to show where errors develop.

^{*1} Low relative accuracy is not possible in the sub-normal range of answers.

- Can someone explain this to me, i'm so confused
- Make recompiles non-updated file using Windows
- On RPi 4 GCC and CLANG use 16-bit loads
- Calling C from COBOL: trouble with stderr
- how to create an array that stores prime numbers?
- Converting a flowchart to a program in C language (I'm new to programming and need help)
- Speed up writing multiple image TIFF?
- Difference between static and shared libraries?
- PRE-2016 Valgrind: Still Reachable Leak detected by Valgrind
- My function for printing my linked list isn't printing it
- Getting a weird percent sign in printf output in terminal with C
- 0xC0000374: I've been getting error in my school C coding project and I'm unsure what to change
- What's the difference between pressing F5 and clicking the run button (at the top right) on a C or C++ project in VS Code?
- correct context switching under x86-64 / Windows
- cannot find entry symbol _start when trying to link c main with assembly function
- ISO_C_BINDING, calling C from Fortran
- Can't assign a tree node into a stack
- How does a C-style struct with a bitfield get represented in a Rust #[repr(C)] struct?
- want to understand pointer to struct and void pointer conversion
- segfault in using getline() function in C
- Register access suppresses otherwise possible optimization (avr-gcc)
- Invoking C API consisting of buffer pointer parameter using C#
- Where is the header file of emscripten/bind.h?
- Are C/C++ fundamental types atomic?
- While and for loops not giving the right answer
- Python ctypes passing string pointer / buffer
- Bad BLX instruction generated when calling asm function from C function (gcc on STM32H753)
- Tainted string in C
- Error while configuring CMake project: Running 'nmake' '-?' failed
- Regarding background processes using fork() and child processes in my dummy shell