Search code examples
calgorithmnewtons-method

Why won't the Newton-Raphson method converge when computing the square roots of 1.0E21 and 1.0E23?


I have the following functions for computing the square root of a number using the Newton-Raphson method.

double get_dx(double y, double x)
{
   return (x*x - y)/(2*x);
}

double my_sqrt(double y, double initial)
{
   double tolerance = 1.0E-6;
   double x = initial;
   double dx;
   int iteration_count = 0;
   while ( iteration_count < 100 &&
           fabs(dx = get_dx(y, x)) > tolerance )
   {
      x -= dx;
      ++iteration_count;
      if ( iteration_count > 90 )
      {
         printf("Iteration number: %d, dx: %lf\n", iteration_count, dx);
      }
   }

   if ( iteration_count < 100 )
   {
      printf("Got the result to converge in %d iterations.\n", iteration_count);
   }
   else
   {
      printf("Could not get the result to converge.\n");
   }

   return x;
}

They work for most numbers. However, they don't converge when y is 1.0E21 and 1.0E23. There might be other numbers, that I don't know of yet, for which the functions don't converge.

I tested with initial values of:

  1. y*0.5 -- something to get started with.
  2. 1.0E10 -- a number in the neighborhood of the final result.
  3. sqrt(y)+1.0 -- A very close number to the final answer, obviously.

In all the cases the functions failed to converge for 1.0E21 and 1.0E23. I tried a lower number, 1.0E19, and a higher number, 1.0E25. Neither of them is a problem.

Here's a complete program:

#include <stdio.h>
#include <math.h>

double get_dx(double y, double x)
{
   return (x*x - y)/(2*x);
}

double my_sqrt(double y, double initial)
{
   double tolerance = 1.0E-6;
   double x = initial;
   double dx;
   int iteration_count = 0;
   while ( iteration_count < 100 &&
           fabs(dx = get_dx(y, x)) > tolerance )
   {
      x -= dx;
      ++iteration_count;
      if ( iteration_count > 90 )
      {
         printf("Iteration number: %d, dx: %lf\n", iteration_count, dx);
      }
   }

   if ( iteration_count < 100 )
   {
      printf("Got the result to converge in %d iterations.\n", iteration_count);
   }
   else
   {
      printf("Could not get the result to converge.\n");
   }

   return x;
}

void test(double y)
{
   double ans = my_sqrt(y, 0.5*y);
   printf("sqrt of %lg: %lg\n\n", y, ans);

   ans = my_sqrt(y, 1.0E10);
   printf("sqrt of %lg: %lg\n\n", y, ans);

   ans = my_sqrt(y, sqrt(y) + 1.0);
   printf("sqrt of %lg: %lg\n\n", y, ans);
}

int main()
{
   test(1.0E21);
   test(1.0E23);
   test(1.0E19);
   test(1.0E25);
}

and here's the output (built on 64 bit Linux, using gcc 4.8.4):

Iteration number: 91, dx: -0.000002
Iteration number: 92, dx: 0.000002
Iteration number: 93, dx: -0.000002
Iteration number: 94, dx: 0.000002
Iteration number: 95, dx: -0.000002
Iteration number: 96, dx: 0.000002
Iteration number: 97, dx: -0.000002
Iteration number: 98, dx: 0.000002
Iteration number: 99, dx: -0.000002
Iteration number: 100, dx: 0.000002
Could not get the result to converge.
sqrt of 1e+21: 3.16228e+10

Iteration number: 91, dx: 0.000002
Iteration number: 92, dx: -0.000002
Iteration number: 93, dx: 0.000002
Iteration number: 94, dx: -0.000002
Iteration number: 95, dx: 0.000002
Iteration number: 96, dx: -0.000002
Iteration number: 97, dx: 0.000002
Iteration number: 98, dx: -0.000002
Iteration number: 99, dx: 0.000002
Iteration number: 100, dx: -0.000002
Could not get the result to converge.
sqrt of 1e+21: 3.16228e+10

Iteration number: 91, dx: 0.000002
Iteration number: 92, dx: -0.000002
Iteration number: 93, dx: 0.000002
Iteration number: 94, dx: -0.000002
Iteration number: 95, dx: 0.000002
Iteration number: 96, dx: -0.000002
Iteration number: 97, dx: 0.000002
Iteration number: 98, dx: -0.000002
Iteration number: 99, dx: 0.000002
Iteration number: 100, dx: -0.000002
Could not get the result to converge.
sqrt of 1e+21: 3.16228e+10

Iteration number: 91, dx: 0.000027
Iteration number: 92, dx: 0.000027
Iteration number: 93, dx: 0.000027
Iteration number: 94, dx: 0.000027
Iteration number: 95, dx: 0.000027
Iteration number: 96, dx: 0.000027
Iteration number: 97, dx: 0.000027
Iteration number: 98, dx: 0.000027
Iteration number: 99, dx: 0.000027
Iteration number: 100, dx: 0.000027
Could not get the result to converge.
sqrt of 1e+23: 3.16228e+11

Iteration number: 91, dx: -0.000027
Iteration number: 92, dx: -0.000027
Iteration number: 93, dx: -0.000027
Iteration number: 94, dx: -0.000027
Iteration number: 95, dx: -0.000027
Iteration number: 96, dx: -0.000027
Iteration number: 97, dx: -0.000027
Iteration number: 98, dx: -0.000027
Iteration number: 99, dx: -0.000027
Iteration number: 100, dx: -0.000027
Could not get the result to converge.
sqrt of 1e+23: 3.16228e+11

Iteration number: 91, dx: 0.000027
Iteration number: 92, dx: 0.000027
Iteration number: 93, dx: 0.000027
Iteration number: 94, dx: 0.000027
Iteration number: 95, dx: 0.000027
Iteration number: 96, dx: 0.000027
Iteration number: 97, dx: 0.000027
Iteration number: 98, dx: 0.000027
Iteration number: 99, dx: 0.000027
Iteration number: 100, dx: 0.000027
Could not get the result to converge.
sqrt of 1e+23: 3.16228e+11

Got the result to converge in 35 iterations.
sqrt of 1e+19: 3.16228e+09

Got the result to converge in 6 iterations.
sqrt of 1e+19: 3.16228e+09

Got the result to converge in 1 iterations.
sqrt of 1e+19: 3.16228e+09

Got the result to converge in 45 iterations.
sqrt of 1e+25: 3.16228e+12

Got the result to converge in 13 iterations.
sqrt of 1e+25: 3.16228e+12

Got the result to converge in 1 iterations.
sqrt of 1e+25: 3.16228e+12

Can anyone explain why the above functions don't converge for 1.0E21 and 1.0E23?


Solution

  • This answer specifically explains why the algorithm fails to converge for an input of 1e23.

    The issue that you're facing is known as the "small difference of large numbers". Specifically, you're computing (x*x - y)/(2*x) where y is 1e23, and x is approximately 3.16e11.

    The IEEE-754 encoding for 1e23 is 0x44b52d02c7e14af6. Hence the exponent is 0x44b, which is 1099 decimal. That has to be reduced by 1023, since the exponent is encoded as offset-binary. Then you have to subtract another 52 to find the weight of an LSB.

    0x44b ==>    1099 - 1023 - 52 = 24    ==> 1 LSB = 2^24
    

    So a single LSB has the value 16777216.

    The result from this code

    double y = 1e23;
    double x = sqrt(y);
    double dx = x*x - y;
    printf( "%lf\n", dx );
    

    is in fact -16777216.

    As a result, when you compute x*x - y, the result is either going to be zero, or it's going to be a multiple of 16777216. Dividing 16777216 by 2*x which is 2*3.16e11 gives an error of 0.000027, which is not within your tolerance.

    The two closest values to the sqrt(y) are

    0x4252682931c035a0
    0x4252682931c035a1
    

    and when you square those numbers you get

    0x44b52d02c7e14af5
    0x44b52d02c7e14af7
    

    So neither one matches the desired result, which is

    0x44b52d02c7e14af6
    

    and hence the algorithm can never converge.


    The reason the algorithm works for 1e25 is due to luck. The encoding for 1e25 is

    0x45208b2a2c280291
    

    The encoding for the sqrt(1e25) is

    0x428702337e304309
    

    and when you square that number, you get

    0x45208b2a2c280291
    

    Hence x*x - y is exactly 0, and the algorithm gets lucky.