Search code examples
machine-learningneural-networkdeep-learningbackpropagationgradient-descent

Gradient Descent without derivative


So I’m trying to understand Gradient Descent and I’m confused. If you have a parabola which is of the loss as you change a weight. Instead of taking the derivative at the point of x we are at, why not just easily find the vertex of the parabola?


Solution

  • You can. If your loss function is actually a parabola (or other conveniently convex function), you can. But more likely your loss function is non-convex and super complex, and you don't know a-priori what it is. So we use gradient descent the way we do - we constantly sample. When you see convenient parabolas, that's just a simplified illustration.