I am trying to run gradient descent and cannot get the same result as octaves built-in fminunc, when using exactly the same data
My Code is
%for 5000 iterations
for iter = 1:5000
%%Calculate the cost and the new gradient
[cost, grad] = costFunction(initial_theta, X, y);
%%Gradient = Old Gradient - (Learning Rate * New Gradient)
initial_theta = initial_theta - (alpha * grad);
end
Where costFunction calucates the cost and gradient, when given an example (X,y) and parameters(theta).
a built-in octave function fminunc also calling costFunction and with the same data finds a much much better answer in far fewer iterations.
Given that octave uses the same cost function i assume the costFunction is correct.
I have tried decreasing the learning rate in case i am hitting a local minima and increasing the number of iterations, the cost stops decreasing so i think it seems that it has found the minimum, but the final theta still has a much larger cost and is no where near as accurate
even if fminunc is using a better alogoritm hould gradient descent eventually find the same answer with enough iterations and a smaller learning rate?
or can anyone see if i am doing anything wrong?
Thank you for any and all help.
Your comments are wrong, but the algorithm is good.
In gradient descent it's easy to fall into numerical problems, then I suggest to perform feature normalization.
Also, if you're unsure about your learning rate, try to adjust it dynamically. Something like:
best_cost = Inf;
best_theta = initial_theta;
alpha = 1;
for iter = 1:500
[cost, grad] = costFunction(best_theta, X_reg, y);
if (cost < best_cost)
best_theta = best_theta - alpha * grad;
best_cost = cost;
else
alpha = alpha * 0.99
end
end
Moreover remember that different answers can give the same decision boundaries. For example for hypothesis h(x) = x(0) + theta(1) * x(1) + theta(2) * x(2) these answers give the same boundary:
theta = [5, 10, 10];
theta = [10, 20, 20];