Andrew Ng's course in Coursera, which Stanford's Machine Learning course, features programming assignments that deal with implementing the algorithms taught in class. The goal of this assignment is to implement linear regression through gradient descent with an input set of X, y, theta, alpha (learning rate), and number of iterations
.
I implemented this solution in Octave, the prescribed language in the course.
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y);
J_history = zeros(num_iters, 1);
numJ = size(theta, 1);
for iter = 1:num_iters
for i = 1:m
for j = 1:numJ
temp = theta(j) - alpha /m * X(i, j) * (((X * theta)(i, 1)) - y(i, 1));
theta(j) = temp
end
prediction = X * theta;
J_history(iter, 1) = computeCost(X,y,theta)
end
end
On the other hand, here is the cost function:
function J = computeCost(X, y, theta)
m = length(y);
J = 0;
prediction = X * theta;
error = (prediction - y).^2;
J = 1/(2 * m) .* sum(error);
end
This does not pass the submit()
function. The submit()
function simply validates the data through passing an unknown test case.
I have checked other questions on StackOverflow but I really don't get it. :)
Thank you very much!
Your gradient seems to be correct and as already pointed out in the answer given by @Kasinath P, it is likely that the problem is that the code is too slow. You just need to vectorize it. In Matlab/Octave, you usually need to avoid for
loops (note that although you have parfor
in Matlab, it is not yet available in octave). So it is always better, performance-wise, to write something like A*x
instead of iterating over each row of A
with a for
loop. You can read about vectorization here.
If I understand correctly, X
is a matrix of size m*numJ
where m
is the number of examples, and numJ
is the number of features (or the dimension of the space where each point lies. In that case, you can rewrite your cost function as
(1/(2*m)) * (X*theta-y)'*(X*theta-y);%since ||v||_2^2=v'*v for any vector v in Euclidean space
Now, we know from basic matrix calculus that for any two vectors s
and v
that are functions from R^{num_J}
to R^m
, the Jacobian of s^{t}v
is given by
s^{t}Jacobian(v)+v^{t}*Jacobian(s) %this Jacobian will have size 1*num_J.
Applying that to your cost function, we obtain
jacobian=(1/m)*(theta'*X'-y')*X;
So if you just replace
for i = 1:m
for j = 1:numJ
%%% theta(j) updates
end
end
with
%note that the gradient is the transpose of the Jacobian we've computed
theta-=alpha*(1/m)*X'*(X*theta-y)
you should see a great increase in performance.