Search code examples
machine-learninggradient-descent

Will gradient descent be stuck in non-minima point? How can we prove its correctness?


For stuck example, let our cost function be J(x,y) = x * y and we are currently at point (0,0)

Then the gradient vector will be (0,0). That means we will not move to any other point with the gradient descent algorithm.

For the later question, let's consider another example: the derivative by x of function F(x,y) (let's call it Fx(x,y)) is negative, and the derivative by y of function F(x,y) (let's call it Fy(x,y)) is also negative. Then, what we will do with gradient descent is moving along the vector alpha * (Fx(x,y), Fy(x,y)). How can we guaranteed that F(x + alpha * Fx(x,y),y + alpha * Fy(x,y)) < F(x,y) for any sufficiently small alpha??


Solution

  • There is no guarantee for gradient descent algorithms to find the global minimum or even a local minimum. Yes, the algorithm would be stuck at (0,0) as you described. However you will most likely never be exactly at (0,0). Also there are a lot of techniques that prevent this from happening.