optimization machine-learning gradient-descent

Update equation for gradient descent

If we have a approximation function y = f(w,x), where x is input, y is output, and w is the weight. According to gradient descent rule, we should update the weight according to w = w - df/dw. But is that possible that we update the weight according to w = w - w * df/dw instead? Has anyone seen this before? The reason I want to do this is because it is easier for me to do it this way in my algorithm.

Solution

Recall, gradient descent is based on the Taylor expansion of f(w, x) in the close vicinity of w, and has its purpose---in your context---in repeatedly modifying the weight in small steps. The reverse gradient direction is just a search direction, based upon very local knowledge of the function f(w, x).

Usually the iterative of the weight includes a step length, yielding the expression

w_(i+1) = w_(i) - nu_j df/dw,

where the value of the step length nu_j is found by using line search, see e.g. https://en.wikipedia.org/wiki/Line_search.

Hence, based on the discussion above, to answer your question: no, it is not a good idea to update according to

w_(i+1) = w_(i) - w_(i) df/dw.

Why? If w_(i) is large (in context), we'll take a huge step based on very local information, and we would be using something very different than the fine-stepped gradient descent method.

Also, as lejlot points out in the comments below, a negative value of w(i) would mean you traverse in the (positive) direction of the gradient, i.e., in the direction in which the function grows most rapidly, which is, locally, the worst possible search direction (for minimization problems).