I have an an algorithm for calculating the floating point square root divide using the newton-raphson algorith. My results are not fully accurate and sometimes off by 1 ulp.
I was wondering if there is a refinement algorithm for floating point division to get the final bits of accuracy. I use the tuckerman test for square root, but is there a similar algorithm for division? Or can the tuckerman test be adapted for division?
I tried using this algorithm too but didn't get full accuracy results:
z= divisor
r_temp = divisor*q
r = dividend - r_temp
result_temp = r*z
q + result_temp
One practical way of correctly rounding the result of iterative division is to produce a preliminary quotient to within one ulp of the mathematical result, then use the exactly-computed residual to compute the final result.
The tool of choice for the exact computation of residuals is the fused-multiply add (FMA) operation. Much of the foundational work of this approach (both in terms of the mathematics and of practical implementations) is due to Peter Markstein and was later refined by other researchers. Markstein's results are nicely summarized in his book:
Peter Markstein, IA-64 and Elementary Functions: Speed and Precision. Prentice-Hall 2000.
A straightforward approach to correctly-rounded division using Markstein's approach is to first compute a correctly-rounded reciprocal, then compute the correctly-rounded quotient by multiplying it with the dividend, followed by the final residual-based rounding step.
The residual can be used to compute the final rounded result directly, as is shown for the quotient rounding in the code below (I noticed that this code sequence resulted in an incorrectly rounded result in one out of 1011 divisions, and replaced it with another instance of the comparison-and-select idiom) which is the technique used by Markstein. Alternatively it may be used as part of a two-sided comparison-and-select process somewhat akin to Tuckerman rounding, which is shown for the reciprocal rounding in the code below.
There is one caveat with regard to the reciprocal computation. Many commonly used iterative approaches (including the one I used below), when combined with Markstein's rounding technique, deliver an incorrect result if the mantissa of the divisor consists entirely of 1-bits.
One way of getting around this is to treat this case specially. In the code below I instead opted for a two-sided comparison-and-select approach, which also allows errors slightly larger than one ulp prior to rounding and thus eliminates the need to use FMA in the reciprocal iteration itself.
Please note that I omitted the handling of sub-normal results in the C code below to keep the code concise and easy to follow. I have limited myself to standard C library functions for tasks like extracting parts of floating-point operands, assembling floating-point numbers, and applying one-ulp increments and decrements. Most platforms will offer machine-specific options with higher performance for these.
float my_divf (float a, float b)
{
float q, r, ma, mb, e, s, t;
int ia, ib;
if (!isnanf (a+b) && !isinff (a) && !isinff (b) && (b != 0.0f)) {
/* normal cases: remove sign, split args into exponent and mantissa */
ma = frexpf (fabsf (a), &ia);
mb = frexpf (fabsf (b), &ib);
/* minimax polynomial approximation to 1/mb for mb in [0.5,1) */
r = - 3.54939341e+0f;
r = r * mb + 1.06481802e+1f;
r = r * mb - 1.17573657e+1f;
r = r * mb + 5.65684575e+0f;
/* apply one iteration with cubic convergence */
e = 1.0f - mb * r;
e = e * e + e;
r = e * r + r;
/* round reciprocal to nearest-or-even */
e = fmaf (-mb, r, 1.0f); // residual of 1st candidate
s = nextafterf (r, copysignf (2.0f, e)); // bump or dent
t = fmaf (-mb, s, 1.0f); // residual of 2nd candidate
r = (fabsf (e) < fabsf (t)) ? r : s; // candidate with smaller residual
/* compute preliminary quotient from correctly-rounded reciprocal */
q = ma * r;
/* round quotient to nearest-or-even */
e = fmaf (-mb, q, ma); // residual of 1st candidate
s = nextafterf (q, copysignf (2.0f, e)); // bump or dent
t = fmaf (-mb, s, ma); // residual of 2nd candidate
q = (fabsf (e) < fabsf (t)) ? q : s; // candidate with smaller residual
/* scale back into result range */
r = ldexpf (q, ia - ib);
if (r < 1.17549435e-38f) {
/* sub-normal result, left as an exercise for the reader */
}
/* merge in sign of quotient */
r = copysignf (r, a * b);
} else {
/* handle special cases */
if (isnanf (a) || isnanf (b)) {
r = a + b;
} else if (b == 0.0f) {
r = (a == 0.0f) ? (0.0f / 0.0f) : copysignf (1.0f / 0.0f, a * b);
} else if (isinff (b)) {
r = (isinff (a)) ? (0.0f / 0.0f) : copysignf (0.0f, a * b);
} else {
r = a * b;
}
}
return r;
}