Search code examples
pythonnumpyconvex-optimization

Having trouble making update in gradient descent implementation?


Hi I am working on implementing gradient descent with backtracking line search. However when I try to update f(x0) the value doesn't change. Could it be something going on with the lambda expression, I am not too familiar with them?

import numpy as np
import math
alpha = 0.1
beta = 0.6

f = lambda x: math.exp(x[0] + 3*x[1] - 0.1) + math.exp(x[0] - 3*x[1] -0.1) + math.exp(-1*x[0] - 0.1)
dfx1 = lambda x: math.exp(x[0] + 3*x[1] - 0.1) + math.exp(x[0] - 3*x[1] -0.1) - math.exp(-x[0] - 0.1)
dfx2 = lambda x: 3*math.exp(x[0] + 3*x[1] - 0.1) - 3*math.exp(x[0] - 3*x[1] -0.1) 

t = 1
count = 1
x0 = np.array([1.0,1.0])
dx0 = np.array([1e-3, 1e-3])

x = []
d = np.array([-1*dfx1(x0),-1*dfx2(x0)]);

grad = np.array([1*dfx1(x0),1*dfx2(x0)])
def backtrack(x0, dfx1, dfx2, t, alpha, beta, count):
    while (f(x0 + t*d) > f(x0) + alpha*t*np.dot(d,grad) or count < 50 ):
        d[0] = -1*dfx1(x0);
        d[1] = -1*dfx2(x0);
        grad[0] = dfx1(x0);
        grad[1] = dfx2(x0);
        x0[0] = x0[0] + t*d[0];
        x0[1] = x0[1] + t*d[1];
        
        t *= beta;
        count += 1
        x.append(f(x0));
    return t

t = backtrack(x0, dfx1, dfx2, t, alpha, beta,count)

print("\nfinal step size :",  t)

print(np.log(x))

print(f(x0))

Solution

  • Update for new code

    OK, your numbers now change too much!

    When writing these routines stepping through the code with a debugger is really useful to check that the code is doing what you want. In this case you'll see that on the second pass through the loop x0 = [-1.32e+170, 3.96e+170]. Taking the exponential of that is what is causing the problem. If you don't have a debugger, try printing some values.

    What can you do to fix it? One fix is to reduce t. Starting with t=1e-3 stops the problem.

    However, I suspect that you have more issues in store. I don't know the technique you are trying to implement, but I suspect that t shouldn't just reduce by a fixed ratio each step, and the np.dot(...) term with anti-parallel vectors also looks suspect.

    If the implementation is not the point of this, is there a library function that does what you want? For example, if you just want to minimise f from the starting point 1, 1 see the minimise function in SciPy [https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html]

    import scipy.optimize
    import numpy as np
    
    def get_exponentials(x):
        a = np.exp(x[0] + 3 * x[1] - 0.1)
        b = np.exp(x[0] - 3 * x[1] - 0.1)
        c = np.exp(-x[0] - 0.1)
        return a, b, c
    def dfdx(x):
        a, b, c = get_exponentials(x)
        return np.array([a + b - c, 3 * (a - b)])
    def f(x):
        a, b, c = get_exponentials(x)
        return a + b + c
    
    scipy.optimize.minimize(f, np.array([1.0, 1.0]), jac=dfdx)
    # x = [-3.46573682e-01, -5.02272755e-08]), f = 2.559267
    

    If line search is particularly important to you there is [https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.line_search.html]:

    scipy.optimize.line_search(f, dfdx, np.array([1.0, 1.0]), np.array([-1.0, -1.0]))
    # alpha 1.0,
    # number of function calls 2,
    # number of gradient calls 1,
    # new function value 2.7145122541078788,
    # old function value 49.85777661748123,
    # new gradient [0.90483742, 0.]
    

    Finally - you said initially that you weren't happy with the lambdas - there is no need to use them at all. They are not often used to generate named functions. See [https://stackoverflow.com/questions/134626/which-is-more-preferable-to-use-lambda-functions-or-nested-functions-def] for more of a discussion.