Search code examples
algorithmmathsqrt

Can a function of a 32-bit square root be used to help calculate a 64-bit square root?


To expand on the idea, let's say I have 2 32-bit registers representing the high and low bits of a 64-bit floating point. I want to calculate the 64-bit square root on them. However, while I don't have a 64-bit square root function, I do have a 32-bit square root function.

My question is this: does having a 32-bit square root at my disposal help me if I want to calculate that 64-bit square root, or am I sort of stuck trying to do a converging loop like Newton-Raphson or something like that?


Solution

  • TL;DR Yes.

    Depending on the capabilities and deficiencies of the hardware, tool chain, and math library of your platform, this may not necessarily allow for the fastest or least painful way of computing a double-precision square root. Below I am showing a straightforward approach based on Arnold Schönhage's coupled iteration for the square root and reciprocal square root:

    Starting with an approximation to the reciprocal square root rapprox ~= 1/√a, we compute s0 = a * rapprox and r0 = rapprox/2, then iterate:

    si+1 = si + ri * (a - si * si)
    ri+1 = ri + ri * (1 - ri * 2 * si+1)

    where the si are approximations to √a and the ri are approximations to 1/(2√a). This iteration is the Newton-Raphson iteration cleverly re-arranged, and as such has quadratic convergence, meaning each step will approximately double the number of correct bits. Starting with single-precision rapprox, only two steps are needed to achieve double-precision accuracy.

    If we now take advantage of the fused multiply-add operation (FMA), supported by common modern processors and usually accessible via a function fma(), each half-step requires only two FMAs. As an added bonus, we do not need special rounding logic, because FMA computes a*b+c using the full product a*b, without applying any truncation or rounding. The resulting code, given here in an ISO C99 version, is short and sweet:

    #include <stdio.h>
    #include <stdlib.h>
    #include <stdint.h>
    #include <fenv.h>
    #include <math.h>
    
    double my_sqrt (double a)
    {
        double b, r, v, w;
        float bb, rr, ss;
        int e, t, f;
    
        if ((a <= 0) || isinf (a) || isnan (a)) {
            if (a < 0) {
                r = 0.0 / 0.0;
            } else {
                r = a + a;
            }
        } else {
            /* compute exponent adjustments */
            b = frexp (a, &e);
            t = e - 2*512;
            f = t / 2;
            t = t - 2 * f;
            f = f + 512;
            /* map argument into the primary approximation interval [0.25,1) */
            b = ldexp (b, t);
            bb = (float)b;
            /* compute reciprocal square root */
            ss = 1.0f / bb;
            rr = sqrtf (ss);
            r = (double)rr;
            /* Use A. Schoenhage's coupled iteration for the square root */
            v = 0.5 * r;
            w = b * r;
            w = fma (fma (w, -w, b), v, w);
            v = fma (fma (r, -w, 1), v, v);
            w = fma (fma (w, -w, b), v, w);
            /* map back from primary approximation interval by jamming exponent */
            r = ldexp (w, f);
        }
        return r;
    }
    
    /* Professor George Marsaglia's 64-bit KISS PRNG */
    static uint64_t xx = 1234567890987654321ULL;
    static uint64_t cc = 123456123456123456ULL;
    static uint64_t yy = 362436362436362436ULL;
    static uint64_t zz = 1066149217761810ULL;
    static uint64_t tt;
    #define MWC64  (tt = (xx << 58) + cc, cc = (xx >> 6), xx += tt, cc += (xx < tt), xx)
    #define XSH64  (yy ^= (yy << 13), yy ^= (yy >> 17), yy ^= (yy << 43))
    #define CNG64  (zz = 6906969069ULL * zz + 1234567ULL)
    #define KISS64 (MWC64 + XSH64 + CNG64)
    
    int main (void)
    {
        volatile union {
            double f;
            unsigned long long int i;
        } arg, res, ref;
        unsigned long long int count = 0ULL;
    
        do {
            arg.i = KISS64;
            ref.f = sqrt (arg.f);
            res.f = my_sqrt (arg.f);
            if (res.i != ref.i) {
                printf ("\n!!!! arg=% 23.16e %016llx  res=% 23.16e %016llx  ref=% 23.16e %016llx\n",
                        arg.f, arg.i, res.f, res.i, ref.f, ref.i);
            }
            count++;
            if ((count & 0xffffff) == 0) printf ("\rtests = %llu", count);
        } while (1);
        return EXIT_SUCCESS;
    }
    

    An exhaustive test of this code across two consecutive binades will take a small cluster of machines about a week or so, here I have included a quick "smoke" test using random operands.

    On hardware that does not support the FMA operation, fma() will be based on emulation. This is slow, and several such emulations have proven to be faulty. The Schönhage iteration will work just fine without FMA, but additional rounding logic will have to be added in that case. Where truncating (round-to-zero) floating-point multiplies are supported, the easiest solution is use of Tuckerman rounding. Otherwise it will likely be necessary to re-interpret the double-precision argument and preliminary result into a 64-bit integer and perform the rounding with the help of integer operations.