I'm trying to learn machine learning so I'm taking a course and currently studying gradient descent for linear regression. I just learned that if the learning rate is small enough, the value returned by the cost function should continuously decrease until convergence. When I imagine this being done in a loop of code, it seems like I could just keep track of what the cost was in the previous iteration and exit the loop if the new cost is greater than the previous, since this tells us the learning rate is too large. I'd like to hear opinions since I'm new to this, but in an effort to not make this question primarily opinion-based my main question is this: Would there be anything wrong with this method of detecting a learning rate that needs to be decreased? I'd appreciate an example of when this method would fail, if possible.
In this example below, we will vary the learning rate eta = 10^k
with k={-6,-5,-4,...0}
def f(x):
return 100 * (x[ 0] *x[0] - x[ 1]) **2 + (x[ 0] -1) **2
def df(x):
a = x[ 0] *x[0] - x[ 1]
ret = np.zeros(2)
ret[ 0] = 400 * a * x[0] + 2 * (x[0] - 1)
ret[ 1] = -200 * a
return ret
for k in range(-6, 0):
eta = math.pow(10.0, k)
print("eta: " + str(eta))
x = -np.ones(2)
for iter in range(1000000):
fx = f(x)
if fx < 1e-10:
print(" solved after " + str(iter) + " iterations; f(x) = " + str(f(x)))
break
if fx > 1e10:
print(" divergence detected after " + str(iter) + " iterations; f(x) = " +
str(f(x)))
break
g = df(x)
x -= eta * g
if iter == 999999:
print(" not solved; f(x) = " + str(f(x)))
For too small learning rates, the optimization is very slow and the problem is not solved within the iteration budget. For too large learning rates, the optimization process becomes unstable and diverges very quickly. The learning rate must be "just right" for the optimization process to work well.