Search code examples
machine-learningoctavegradient-descent

Gradient descent vs fminunc


I am trying to run gradient descent and cannot get the same result as octaves built-in fminunc, when using exactly the same data

My Code is

%for 5000 iterations
for iter = 1:5000

%%Calculate the cost and the new gradient
[cost, grad] = costFunction(initial_theta, X, y);


%%Gradient = Old Gradient - (Learning Rate * New Gradient)
initial_theta = initial_theta - (alpha * grad);

end 

Where costFunction calucates the cost and gradient, when given an example (X,y) and parameters(theta).

a built-in octave function fminunc also calling costFunction and with the same data finds a much much better answer in far fewer iterations.

Given that octave uses the same cost function i assume the costFunction is correct.

I have tried decreasing the learning rate in case i am hitting a local minima and increasing the number of iterations, the cost stops decreasing so i think it seems that it has found the minimum, but the final theta still has a much larger cost and is no where near as accurate

even if fminunc is using a better alogoritm hould gradient descent eventually find the same answer with enough iterations and a smaller learning rate?

or can anyone see if i am doing anything wrong?

Thank you for any and all help.


Solution

  • Your comments are wrong, but the algorithm is good.

    In gradient descent it's easy to fall into numerical problems, then I suggest to perform feature normalization.

    Also, if you're unsure about your learning rate, try to adjust it dynamically. Something like:

    best_cost = Inf;
    best_theta = initial_theta;
    alpha = 1;
    
    for iter = 1:500
      [cost, grad] = costFunction(best_theta, X_reg, y);
    
      if (cost < best_cost)
        best_theta = best_theta - alpha * grad;
        best_cost = cost;
      else
        alpha = alpha * 0.99
      end
    end
    

    Moreover remember that different answers can give the same decision boundaries. For example for hypothesis h(x) = x(0) + theta(1) * x(1) + theta(2) * x(2) these answers give the same boundary:

    theta = [5, 10, 10];
    theta = [10, 20, 20];