Could be quite a trivial question to answer, but I just wanted to be clearer. From the available literature and the discussion in What is the difference between Gradient Descent and Newton's Gradient Descent?, both methods involve computing a derivative, and then moving towards a minimum. In case of simple gradient-descent method, we calculate only the first order derivative; in Newton's method, we calculate the second order derivative as well as hessian, and apply to the vector. Moreover, the update of the vector in Newton/s method may not always be in the direction of the (-ive) gradient.
Moreover, for a given function f(x), both methods attempt to find a minimum that satisfies f'(x)=0; in gradient-descent method, the objective is argmin f(x), whereas in Newton's method, the objective is f'(x) = 0. Another difference is stopping criterion, which in Gradient-descent method is f'(x) = 0, whereas in newton's method, it is f(x)=0.
Based on above arguments, would it be justified to say that Newton's method is an (advanced) example of gradient-based optimisation methods? The discussion cited above also falls short answering this question.
in gradient-descent method, the objective is argmin f(x), whereas in Newton's method, the objective is f'(x)=0
That is not the case, both objectives are f'(x)=0
. With gradient descent, just as with Newton's method, you don't have any information on whether the minima you have reached is global or local, so argmin f(x)
holds only for a very small neighborhood.
Another difference is stopping criterion, which in Gradient-descent method is f'(x) = 0, whereas in newton's method, it is f(x)=0
Again, that is incorrect. Both are trying to minimize a cost function f(x)
, and there exists no guarantees whatsoever that the minimum value for f(x)
would be zero. It could be any arbitrary value, so choosing f(x)=0
as the stopping criterion would just plainly be wrong. A good criterion to stop both methods is to look at how much f(x)
is changing during a few consecutive iterations. If it doesn't change for a few, then you might conclude that you have reached a plateau and stop. As alternatives, you can use a criterion such as the gradient's absolute value, or if you have time constraints, you can just use a fixed number of iterations.
would it be justified to say that Newton's method is an (advanced) example of gradient-based optimisation methods
By definition, a gradient method looks in the direction of the gradient. Newton's method, as you know, uses local curvature to define the path towards a local optimum, and might not follow the same direction as the gradient at all, so it just wouldn't make sense to call it gradient-based.