Why that I have written for Andrew Ng's course not accepted?

Andrew Ng's course in Coursera, which Stanford's Machine Learning course, features programming assignments that deal with implementing the algorithms taught in class. The goal of this assignment is to implement linear regression through gradient descent with an input set of X, y, theta, alpha (learning rate), and number of iterations.

I implemented this solution in Octave, the prescribed language in the course.

function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)

m = length(y); 
J_history = zeros(num_iters, 1);

numJ = size(theta, 1);

for iter = 1:num_iters

    for i = 1:m

        for j = 1:numJ

            temp = theta(j) - alpha /m *  X(i, j) * (((X * theta)(i, 1)) - y(i, 1));

            theta(j) = temp

        end

        prediction = X * theta;

J_history(iter, 1) = computeCost(X,y,theta) 

 end   

end

On the other hand, here is the cost function:

function J = computeCost(X, y, theta)

m = length(y); 

J = 0;

prediction = X * theta;

error = (prediction - y).^2;

J = 1/(2 * m) .* sum(error);

end

This does not pass the submit() function. The submit() function simply validates the data through passing an unknown test case.

I have checked other questions on StackOverflow but I really don't get it. :)

Thank you very much!

Solution

Your gradient seems to be correct and as already pointed out in the answer given by @Kasinath P, it is likely that the problem is that the code is too slow. You just need to vectorize it. In Matlab/Octave, you usually need to avoid for loops (note that although you have parfor in Matlab, it is not yet available in octave). So it is always better, performance-wise, to write something like A*x instead of iterating over each row of A with a for loop. You can read about vectorization here.

If I understand correctly, X is a matrix of size m*numJ where m is the number of examples, and numJ is the number of features (or the dimension of the space where each point lies. In that case, you can rewrite your cost function as

(1/(2*m)) * (X*theta-y)'*(X*theta-y);%since ||v||_2^2=v'*v for any vector v in Euclidean space

Now, we know from basic matrix calculus that for any two vectors s and v that are functions from R^{num_J} to R^m, the Jacobian of s^{t}v is given by

s^{t}Jacobian(v)+v^{t}*Jacobian(s) %this Jacobian will have size 1*num_J.

Applying that to your cost function, we obtain

jacobian=(1/m)*(theta'*X'-y')*X;

So if you just replace

for i = 1:m
    for j = 1:numJ
        %%% theta(j) updates
    end
end

with

%note that the gradient is the transpose of the Jacobian we've computed 
theta-=alpha*(1/m)*X'*(X*theta-y)

you should see a great increase in performance.