c nvidia numerical-methods polynomial-math atan2

What is the logic behind this nvidia 's arctan2?

Nvidia has some functions in Cg 3.1 Toolkit Documentation

arctan2 is implemented as follows

float2 atan2(float2 y, float2 x)
{
  float2 t0, t1, t2, t3, t4;

  t3 = abs(x);
  t1 = abs(y);
  t0 = max(t3, t1);
  t1 = min(t3, t1);
  t3 = float(1) / t0;
  t3 = t1 * t3;

  t4 = t3 * t3;
  t0 =         - float(0.013480470);
  t0 = t0 * t4 + float(0.057477314);
  t0 = t0 * t4 - float(0.121239071);
  t0 = t0 * t4 + float(0.195635925);
  t0 = t0 * t4 - float(0.332994597);
  t0 = t0 * t4 + float(0.999995630);
  t3 = t0 * t3;

  t3 = (abs(y) > abs(x)) ? float(1.570796327) - t3 : t3;
  t3 = (x < 0) ?  float(3.141592654) - t3 : t3;
  t3 = (y < 0) ? -t3 : t3;

  return t3;
}

What is the formula or logic behind this ? I couldn't find any references in their libraries.

Solution

The code has 3 parts.

First the input, which can be interpreted as a point (x,y), gets mapped to the sector below the diagonal in the first quadrant. The absolute values and max/min operations effectively act as reflections (or identity) first on the coordinate axes and then on the diagonal.

In the middle then an arcus or inverse tangent approximation for r=y/x is computed. Note that the Taylor series is a = 1*r-1/3*r^3+1/5*r^5-1/7*r^7+... However, the Taylor series is overly correct at the origin and rapidly loses accuracy away from it. Using some fitting procedure, a polynomial that is equally good/bad on the whole interval was determined. Its coefficients are close to the Taylor coefficients, especially in the lower degrees. The polynomial evaluation is most efficiently done via the Horner scheme, where the coefficients are used starting from the highest degree.

And finally, the signs and magnitudes of the input are used to undo the original reflections in reverse order, only that now the angle a gets transformed. So if there was a reflection on the diagonal, a gets mapped to pi/2-a. If there was a reflection on the y axis, a gets mapped to pi-a. And finally in case of a reflection on the x axis, a gets changed to -a.

With some recursive function calls, the procedure in question could also be compactly formulated as (here in Python)

def atan2(y,x):
    if y<0: return -atan2(-y,x)
    if x<0: return pi-atan2(y,-x)
    if x<y: return 0.5*pi-atan2(x,y)
    return p(y/x)

where the polynomial is evaluated as

def p(r):
    r2 = r*r
    res =        - 0.013480470  # *r^11
    res = res*r2 + 0.057477314  # *r^9
    res = res*r2 - 0.121239071  # *r^7 
    res = res*r2 + 0.195635925  # *r^5 
    res = res*r2 - 0.332994597  # *r^3 
    res = res*r2 + 0.999995630  # *r^1 
    return r*res

For comparison the Taylor series can be implemented as

def a11(r):
    r2 = r*r
    res = 0
    for k in range(11,0,-2):
        res = 1/k-r2*res
    return r*res

To compare the errors of both approximations in one plot, use a logarithmic vertical axis, the Taylor error grows too fast.

r = np.linspace(0,1,500)
plt.semilogy(r,abs(p(r)-np.arctan(r)), r, abs(a11(r)-np.arctan(r)))
plt.legend(["residual of p", "residual of Taylor"])
plt.grid(); plt.show()

This gives the error plot

which shows the described error behavior of the minimaxed polynomial and the Taylor polynomial of equal degree.