Search code examples
c++calgorithmnumerical-methodsgradient-descent

Implementing steepest descent algorithm, variable step size


I am trying to implement steepest descent algorithm in programming languages (C/C++/fortran).

For example minimization of f(x1,x2) = x1^3 + x2^3 - 2*x1*x2

  1. Estimate starting design point x0, iteration counter k0, convergence parameter tolerence = 0.1. Say this staring point is (1,0)

  2. Compute gradient of f(x1,x2) at the current point x(k) as grad(f). I will use numerical differentiation here.

    d/dx1 (f) = lim (h->0) (f(x1+h,x2) - f(x1,x2) )/h

    This is grad(f)=(3*x1^2 - 2*x2, 3*x2^2 - 2*x1)

    grad(f) at (0,1) is c0 = (3,-2)

  3. since L2 norm of c0 > tolerence, we proceed for next step

  4. direction d0 = -c0 = (-3,2)

  5. Calculate step size a. Minimize f(a) = f(x0 + ad0) = (1-3a,2a) = (1-3a)^3 + (2a)^3 - 2(1-3a)*(2a). I am not keeping constant step size.

  6. update: new[x1,x2] = old[x1,x2]x + a*d0.

I do not understand how to do step 5. I have a 1D minimization program with bisection method, and it looks like:

program main()
    ...
    ...
    define upper, lower interval
    call function value
    ...calculations
    ...
    ...


function value (input x1in) (output xout)
    ...function is x^4 - 2x^2 + x + 10 
    xout = (xin)^4 - 2*(xin)^2 + (xin) + 10

In this case, looking at step 5, I cannot pass symbolic a. Any ideas how to implement the algorithm in programming language, especially step 5? Please suggest if there is altogether different way to program this. I have seen many programs with constant step size, but I want to compute it at every step. This algorithm can be easy to implement in MATLAB ot python sympy using symbolics, but I do not want to use symbolics. Any suggestions appreciated. Thanks.


Solution

  • If C++ is an option, you can take advantage of functors and lambdas.

    Let's consider a function we want to minimize, for example y = x2 - x + 2. It can be represented as a function object, which is a class with an overloaded operator():

    struct MyFunc {
        double operator()( double x ) const {
            return  x * x - x + 2.0;
        }
    };
    

    Now we can declare an object of this type, use it like a function and pass it to other templated function as a templated parameter.

    // given this templated function:
    template < typename F >
    void tabulate_function( F func, double a, double b, int steps ) {
        //  the functor     ^^^^^^  is passed to the templated function
        double step = (b - a) / (steps - 1);
    
        std::cout << "    x          f(x)\n------------------------\n";
        for ( int i = 0; i < steps; ++i ) {
            double x = a + i * step,
                   fx = func(x);
            //          ^^^^^^^ call the operator() of the functor
            std::cout << std::fixed << std::setw(8) << std::setprecision(3) << x
                      << std::scientific << std::setw(16) << std::setprecision(5)
                      << fx << '\n';
        }   
    }
    
    // we can use the previous functor like this:
    MyFunc example;
    tabulate_function(example, 0.0, 2.0, 21);
    

    OP's function can be implemented (given an helper class to represent 2D points) in a similar way:

    struct MyFuncVec {
        double operator()( const Point &p ) const {
            return p.x * p.x * p.x  +  p.y * p.y * p.y  -  2.0 * p.x * p.y;
        }
    };
    

    The gradient of that function can be represented (given a class which implement a 2D vector) by:

    struct MyFuncGradient {
        Vector operator()( const Point &p ) {
            return Vector(3.0 * p.x * p.x  -  2.0 * p.y, 3.0 * p.y * p.y  -  2.0 * p.x);
        }
    };     
    

    Now, the fifth step of OP question requests to minimize the first function along the direction of the gradient using a monodimensional optimization algorithm which requires a monodimensional function to be passed. We can solve this issue using a lambda:

        MyFuncVec funcOP;
        MyFuncGradient grad_funcOP; 
        Point p0(0.2, 0.8);
        Vector g = grad_funcOP(p0);
    
        // use a lambda to transform the OP function to 1D
        auto sliced_func = [&funcOP, &p0, &g] ( double t ) -> double {
            // those variables ^^^ ^^^ ^^ are captured and used
            return funcOP(p0 - t * g);
        };
    
        tabulate_function(sliced_func, 0, 0.5, 21);
    

    Live example HERE.