This is an assignment for a convex optimization class that I'm taking. The assignment is as follows:
Implement the gradient descent algorithm with backtracking line search to find the optimal step size. Your implementation will be compared to Python's
scipy.optimize.minimize
function.The specific function to minimize is the least squares function. The error between the solution found by the Python library and your implementation must be smaller than 0.001.
I've made an implementation but the error value hovers around 1, and have been searching for ways to improve it but have been having some trouble. Here's the code that I wrote:
Gradient Descent + Backtracking Line Search Implementation
import numpy as np
# Gradient descent.
def min_gd(fun, x0, grad, args=()):
alpha = 0.3
beta = 0.8
delta_x = -grad(x0, *args)
t = backtracking_line_search(fun, x0, grad, delta_x, alpha, beta, args)
x_new = x0 + (t * delta_x)
if np.linalg.norm(x_new) ** 2 > np.linalg.norm(x0) ** 2:
return min_gd(fun, x_new, grad, args)
else:
return x_new
# Line search function returns optimal step size.
def backtracking_line_search(fun, x, grad, delta_x, alpha, beta, args=()):
t = 1
derprod = grad(x, *args) @ delta_x
while fun((x + (t * delta_x)), *args) > fun(x, *args) + (alpha * t * derprod):
t *= beta
return t
Other given functions
import numpy as np
from scipy.optimize import minimize
import gd
# Least Squares function
def LeastSquares(x, A, b):
return np.linalg.norm(A @ x - b) ** 2
# gradient
def grad_LeastSquares(x, A, b):
return 2 * ((A.T @ A) @ x - A.T @ b)
The error between the two results is basically calculated using the L2-norm.
Some ideas that I've came up with are that the tolerance checking point in my gradient descent function may be flawed. Right now I'm essentially checking simply whether the next step is larger than the previous. However, I'm also having trouble wrapping my head around how I might improve that.
Any feedback is appreciated.
Edit
In case anyone is curious on the final code that I wrote to make it work in the desired way:
def min_gd(fun, x0, grad, args=()):
alpha = 0.3
beta = 0.8
delta_x = -grad(x0, *args)
t = backtracking_line_search(fun, x0, grad, delta_x, alpha, beta, args)
x_new = x0 + (t * delta_x)
if np.linalg.norm(grad(x_new, *args)) < 0.01:
return x_new
else:
return min_gd(fun, x_new, grad, args)
I simply fixed the conditional statement so that I'm not just comparing norms, but also checking if the value is smaller than a predetermined tolerance level.
Hope this helps anyone in the future.
Your guess about the tolerance checking is right: the norm of the current vector is not related to convergence. A typical criterion would be a small gradient, so min_gd
should look like
def min_gd(fun, x0, grad, args=()):
alpha = 0.3
beta = 0.8
eps = 0.001
x_new = x0
delta_x = -grad(x0, *args)
while np.linalg.norm(delta_x) > eps:
t = backtracking_line_search(fun, x_new, grad, delta_x, alpha, beta, args)
x_new = x_new + (t * delta_x)
delta_x = -grad(x_new, *args)
return x_new
where eps
is some small positive tolerance.