Explanation of newton's method example on java

http://introcs.cs.princeton.edu/java/13flow/Sqrt.java.html:

public class Sqrt { 
    public static void main(String[] args) { 

        // read in the command-line argument
        double c = Double.parseDouble(args[0]);
        double epsilon = 1e-15;    // relative error tolerance
        double t = c;              // estimate of the square root of c

        // repeatedly apply Newton update step until desired precision is achieved
        while (Math.abs(t - c/t) > epsilon*t) {
            t = (c/t + t) / 2.0;
        }

        // print out the estimate of the square root of c
        System.out.println(t);
    }

}

The thing is..I understand perfectly well how the program works itself. The problem I have is with the equation f(x) = x^2 - c and how that relates to the code above. Like, why divide it by x so that x(x - c/x)? There seems to be a missing mathematical explanation when it comes to some of these examples. In other words, I'm looking for an explanation from a simple mathematical stand point, NOT coding as so much.

Solution

You are given c and you want to solve

t = sqrt(c)

or equivalently,

c = t^2

or then again,

c - t^2 = 0.

I'll call the above equation f(t) = 0 (no mention of c since it is a given constant). Newton method iterates over trial values of t, which I'll label t_i, t_{i+1}, ....

The Taylor expansion to 1st order is:

f(t_i + dt_i) = f(t_i) + dt_i * f'(t_i) + ...

So if you don't quite have f(t_i) = 0, you add a dt_i such that

f(t_i + dt_i) nearly = 0 = f(t_i) + dt_i * f'(t_i) + ...

So dt_i = -f(t_i) / f'(t_i), i.e. f(t_i + -f(t_i) / f'(t_i)) is closer to zero than f(t_i).

If you do the derivatives for f(t) = c - t^2, you'll see that the equation in the code t_{i+1} = (c / t_i + t_i) / 2 is just the iterative formula t_{i+1} = t_i + dt_i with the dt_i estimated above.

This is iterative method, so it does not give an exact solution. You need to decide when you want to stop (sufficient precision), otherwise the algorithm would go on forever. That's why you check f(t_i) < threshold instead of the true f(t_i) = 0. In their case they chose a threshold = epsilon * t^2; I think the multiplication by t^2 was used because if you used a fixed constant as a threshold, you might run into numerical accuracy problems (i.e. if you are playing with trillions, you could never get an fixed accuracy of 10^{-10} due to the finite precision of floating point representation.)