Search code examples
c#algorithmdeterministic

Fast integer sqrt using Math.Sqrt


I'm trying to calculate the square root of integer values. It doesn't need to be very accurate, but it should be fast and deterministic across platforms. I'm using this for a RTS game with lockstep networking.

I'm considering just casting the value to double and use Math.Sqrt, which I assume has hardware acceleration and should be plenty fast. But can I count on this being perfectly deterministic across platforms for all possible inputs (on say a uint64), including the cast back to an integer? From my googling on the subject it seems like floating point sqrt is typically deterministic across platforms (unlike e.g. cos/tan), but does anyone know more about this?


Solution

  • I think you can be assured that it will not be deterministic across all platforms if there are any where the program or the floating point hardware can have its rounding mode altered from the usual settings.

    There is also a more troubling issue when the input value exceeds 2^52 so that the mantissa of a double floating point cannot accurately represent the input value y. For some unlucky values of y rounding to nearest will result in a spurious answer for x that doesn't satisfy x*x <= y. They are quite rare. I sampled them and get 1:10^8 at the onset of trouble and 3:10^7 for numbers >2^63.

    I didn't detect any failures for y < 10^11 but that is a drop in the ocean compared to the total range of an int64. It looks to me like you can safely use sqrt(y) provided that you sanitise the result to guard against the rare exceptions where rounding error causes trouble (and rounding rules or guard digits may vary slightly with some go-faster CPUs).

    The refinement I suggest in integer arithmetic should be fast since it is just one multiply a sanity test and a conditional decrement. This is the test code I put together to take a quick look.

    #include <iostream>
    #include <math.h>
    #include <inttypes.h>
    
    unsigned int safesqrt(uint64_t y)
    {
      // defined to *always* give a deterministic result such that x*x <= y
      // sqrt(y) becomes tricky when y exceeds the 53 bit mantissa of double FP 
      // ironically I think this is one example where x87 80 bit FP would win out!
      // Its 64 bit mantissa can exactly represent every possible integer input.
      // 
      // double results round high for a small fraction of values (approx 1 in 10^8 rising to 3 in 10^7)
      // edge values for y where the mantissa cannot represent the full precision and 
      // sqrt(x) returns a rounded to nearest integer value are the problem
    
      int x = (unsigned) sqrt((double)y);
      uint64_t x2;
      x2 = x;
      x2 *= x2;
      if (x2 > y) x--; // defend against fail high when y > 0x0010 0000 0000 0000, 2^52
      return x;
    }
    
    int main()
    {
      uint64_t base, x2, y, z;
      double dx;
      unsigned x, nrx, nrx2, lastbadx = 0, badcount = 0;
      int i = 1;
      base = 1 ;
      base <<= 62;
      printf("base = %18I64x\n", base);
      printf("     N \t\t  y\t\t\t x^2  \t\t     x (double)     (int) x \t NR(x) \t Nr(NR(x))  bad\n");
      for (y = base; y<base+100000000000; y++)
      {
          dx = sqrt((double)y);
          x = (unsigned)(dx);
          x2 = (uint64_t)x;
          x2 *= x2;
          if (x2 > y)  // in an ideal world this shouldn't happen
          {
              nrx = (int)((y / x + x) / 2);
              nrx2 = (int)((y / nrx + nrx) / 2);  // check to see if NR converges or oscillates
              if (x != lastbadx)
              {
                  printf(" %5u : %20I64u  %20I64u  %18.10gf  %10u %10u %10u", i++, y, x2, dx, x, nrx, nrx2);
                  lastbadx = x;
                  badcount++;
              }
              else
                  badcount++;
          }
          else
          {
              if (badcount) printf(" [%i]\n", badcount);
              badcount = 0;
          }
      }
    }
    

    I left it running overnight and it found about 1400 bad cases in the y>2^63 block. FWIW I found no errors at all in the first 10^11 integer values in just a few minutes (so 10^12, 10^13 would be easily brute force testable). That is still a drop in the ocean compared to the full dynamic range of 10^19.

    You could also defend against fail low by checking that x2+2*x+1 > y but I think that the way that rounding to nearest and hardware sqrt works followed by truncation to integer the chances of that ever triggering are vanishingly small.