Hi I am working on implementing gradient descent with backtracking line search. However when I try to update f(x0) the value doesn't change. Could it be something going on with the lambda expression, I am not too familiar with them?
import numpy as np
import math
alpha = 0.1
beta = 0.6
f = lambda x: math.exp(x[0] + 3*x[1] - 0.1) + math.exp(x[0] - 3*x[1] -0.1) + math.exp(-1*x[0] - 0.1)
dfx1 = lambda x: math.exp(x[0] + 3*x[1] - 0.1) + math.exp(x[0] - 3*x[1] -0.1) - math.exp(-x[0] - 0.1)
dfx2 = lambda x: 3*math.exp(x[0] + 3*x[1] - 0.1) - 3*math.exp(x[0] - 3*x[1] -0.1)
t = 1
count = 1
x0 = np.array([1.0,1.0])
dx0 = np.array([1e-3, 1e-3])
x = []
d = np.array([-1*dfx1(x0),-1*dfx2(x0)]);
grad = np.array([1*dfx1(x0),1*dfx2(x0)])
def backtrack(x0, dfx1, dfx2, t, alpha, beta, count):
while (f(x0 + t*d) > f(x0) + alpha*t*np.dot(d,grad) or count < 50 ):
d[0] = -1*dfx1(x0);
d[1] = -1*dfx2(x0);
grad[0] = dfx1(x0);
grad[1] = dfx2(x0);
x0[0] = x0[0] + t*d[0];
x0[1] = x0[1] + t*d[1];
t *= beta;
count += 1
x.append(f(x0));
return t
t = backtrack(x0, dfx1, dfx2, t, alpha, beta,count)
print("\nfinal step size :", t)
print(np.log(x))
print(f(x0))
Update for new code
OK, your numbers now change too much!
When writing these routines stepping through the code with a debugger is really useful to check that the code is doing what you want. In this case you'll see that on the second pass through the loop x0 = [-1.32e+170, 3.96e+170]
. Taking the exponential of that is what is causing the problem. If you don't have a debugger, try printing some values.
What can you do to fix it? One fix is to reduce t
. Starting with t=1e-3
stops the problem.
However, I suspect that you have more issues in store. I don't know the technique you are trying to implement, but I suspect that t
shouldn't just reduce by a fixed ratio each step, and the np.dot(...)
term with anti-parallel vectors also looks suspect.
If the implementation is not the point of this, is there a library function that does what you want? For example, if you just want to minimise f
from the starting point 1, 1
see the minimise function in SciPy [https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html]
import scipy.optimize
import numpy as np
def get_exponentials(x):
a = np.exp(x[0] + 3 * x[1] - 0.1)
b = np.exp(x[0] - 3 * x[1] - 0.1)
c = np.exp(-x[0] - 0.1)
return a, b, c
def dfdx(x):
a, b, c = get_exponentials(x)
return np.array([a + b - c, 3 * (a - b)])
def f(x):
a, b, c = get_exponentials(x)
return a + b + c
scipy.optimize.minimize(f, np.array([1.0, 1.0]), jac=dfdx)
# x = [-3.46573682e-01, -5.02272755e-08]), f = 2.559267
If line search is particularly important to you there is [https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.line_search.html]:
scipy.optimize.line_search(f, dfdx, np.array([1.0, 1.0]), np.array([-1.0, -1.0]))
# alpha 1.0,
# number of function calls 2,
# number of gradient calls 1,
# new function value 2.7145122541078788,
# old function value 49.85777661748123,
# new gradient [0.90483742, 0.]
Finally - you said initially that you weren't happy with the lambdas - there is no need to use them at all. They are not often used to generate named functions. See [https://stackoverflow.com/questions/134626/which-is-more-preferable-to-use-lambda-functions-or-nested-functions-def] for more of a discussion.