Search code examples
gradient-descentnewtons-method

Would Newton's method classify as a Gradient Descent Method?


Could be quite a trivial question to answer, but I just wanted to be clearer. From the available literature and the discussion in What is the difference between Gradient Descent and Newton's Gradient Descent?, both methods involve computing a derivative, and then moving towards a minimum. In case of simple gradient-descent method, we calculate only the first order derivative; in Newton's method, we calculate the second order derivative as well as hessian, and apply to the vector. Moreover, the update of the vector in Newton/s method may not always be in the direction of the (-ive) gradient.

Moreover, for a given function f(x), both methods attempt to find a minimum that satisfies f'(x)=0; in gradient-descent method, the objective is argmin f(x), whereas in Newton's method, the objective is f'(x) = 0. Another difference is stopping criterion, which in Gradient-descent method is f'(x) = 0, whereas in newton's method, it is f(x)=0.

Based on above arguments, would it be justified to say that Newton's method is an (advanced) example of gradient-based optimisation methods? The discussion cited above also falls short answering this question.


Solution

  • in gradient-descent method, the objective is argmin f(x), whereas in Newton's method, the objective is f'(x)=0

    That is not the case, both objectives are f'(x)=0. With gradient descent, just as with Newton's method, you don't have any information on whether the minima you have reached is global or local, so argmin f(x) holds only for a very small neighborhood.

    Another difference is stopping criterion, which in Gradient-descent method is f'(x) = 0, whereas in newton's method, it is f(x)=0

    Again, that is incorrect. Both are trying to minimize a cost function f(x), and there exists no guarantees whatsoever that the minimum value for f(x) would be zero. It could be any arbitrary value, so choosing f(x)=0 as the stopping criterion would just plainly be wrong. A good criterion to stop both methods is to look at how much f(x) is changing during a few consecutive iterations. If it doesn't change for a few, then you might conclude that you have reached a plateau and stop. As alternatives, you can use a criterion such as the gradient's absolute value, or if you have time constraints, you can just use a fixed number of iterations.

    would it be justified to say that Newton's method is an (advanced) example of gradient-based optimisation methods

    By definition, a gradient method looks in the direction of the gradient. Newton's method, as you know, uses local curvature to define the path towards a local optimum, and might not follow the same direction as the gradient at all, so it just wouldn't make sense to call it gradient-based.