Search code examples
matlabmachine-learningoctavegradient-descent

Multivariable gradient descent Matlab - what is the difference between the two codes?


The following function finds the optimum "thetas" for a regression line using gradient descent. The inputs (X,y) are appended below. My question is what is the difference between code 1 and code 2? Why would code 2 work but code 1 not work?

Thanks in advance!

GRADIENTDESCENTMULTI Performs gradient descent to learn theta, it updates theta by taking num_iters gradient steps with learning rate alpha

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)
% Initialize some useful values
m = length(y); % number of training examples
n = length(theta);
J_history = zeros(num_iters, 1);
costs = zeros(n,1);

for iter = 1:num_iters
    % code 1 - doesn't work 
    for c = 1:n
        for i = 1:m
            costs(c) = costs(c)+(X(i,:)*theta - y(i))*X(i,c);
        end  
    end

    % code 2 - does work
    E = X * theta - y;
    for c = 1:n
        costs(c) = sum(E.*X(:,c));
    end

    % update each theta
    for c = 1:n
        theta(c) = theta(c) - alpha*costs(c)/m;
    end
    J_history(iter) = computeCostMulti(X, y, theta);    
end
end

function J = computeCostMulti(X, y, theta)

for i=1:m
    J = J+(X(i,:)*theta - y(i))^2;
end
J = J/(2*m);

To run the code:

alpha = 0.01;
num_iters = 200; 

% Init Theta and Run Gradient Descent 
theta = zeros(3, 1);
[theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters);

% Plot the convergence graph
figure;
plot(1:numel(J_history), J_history, '-b', 'LineWidth', 2);
xlabel('Number of iterations');
ylabel('Cost J');

% Display gradient descent's result
fprintf('Theta computed from gradient descent: \n');
fprintf(' %f \n', theta);
fprintf('\n');

X is

1.0000    0.1300   -0.2237
1.0000   -0.5042   -0.2237
1.0000    0.5025   -0.2237
1.0000   -0.7357   -1.5378
1.0000    1.2575    1.0904
1.0000   -0.0197    1.0904
1.0000   -0.5872   -0.2237
1.0000   -0.7219   -0.2237
1.0000   -0.7810   -0.2237
1.0000   -0.6376   -0.2237
1.0000   -0.0764    1.0904
1.0000   -0.0009   -0.2237
1.0000   -0.1393   -0.2237
1.0000    3.1173    2.4045
1.0000   -0.9220   -0.2237
1.0000    0.3766    1.0904
1.0000   -0.8565   -1.5378
1.0000   -0.9622   -0.2237
1.0000    0.7655    1.0904
1.0000    1.2965    1.0904
1.0000   -0.2940   -0.2237
1.0000   -0.1418   -1.5378
1.0000   -0.4992   -0.2237
1.0000   -0.0487    1.0904
1.0000    2.3774   -0.2237
1.0000   -1.1334   -0.2237
1.0000   -0.6829   -0.2237
1.0000    0.6610   -0.2237
1.0000    0.2508   -0.2237
1.0000    0.8007   -0.2237
1.0000   -0.2034   -1.5378
1.0000   -1.2592   -2.8519
1.0000    0.0495    1.0904
1.0000    1.4299   -0.2237
1.0000   -0.2387    1.0904
1.0000   -0.7093   -0.2237
1.0000   -0.9584   -0.2237
1.0000    0.1652    1.0904
1.0000    2.7864    1.0904
1.0000    0.2030    1.0904
1.0000   -0.4237   -1.5378
1.0000    0.2986   -0.2237
1.0000    0.7126    1.0904
1.0000   -1.0075   -0.2237
1.0000   -1.4454   -1.5378
1.0000   -0.1871    1.0904
1.0000   -1.0037   -0.2237

Y is

  399900
  329900
  369000
  232000
  539900
  299900
  314900
  198999
  212000
  242500
  239999
  347000
  329999
  699900
  259900
  449900
  299900
  199900
  499998
  599000
  252900
  255000
  242900
  259900
  573900
  249900
  464500
  469000
  475000
  299900
  349900
  169900
  314900
  579900
  285900
  249900
  229900
  345000
  549000
  287000
  368500
  329900
  314000
  299000
  179900
  299900
  239500

Solution

  • I think I have it working right. The main thing is that in code 1 you kept adding to cost(c) but never set it to zero before the next iteration. The only thing you really need to change, is to add something like cost(c) = 0; after for c = 1:n and before for i = 1:m. I did have to change your code a tiny bit to get it working for me (mainly the computeCostMulti) and I've changed the plots to show that the result is the same for both methods. Overall, here is a working demo piece of code with these changes

    close all; clear; clc;
    
    %% Data
    X = [1.0000  0.1300 -0.2237; 1.0000 -0.5042 -0.2237; 1.0000  0.5025 -0.2237; 1.0000 -0.7357 -1.5378;
        1.0000  1.2575  1.0904; 1.0000 -0.0197  1.0904; 1.0000 -0.5872 -0.2237; 1.0000 -0.7219 -0.2237;
        1.0000 -0.7810 -0.2237; 1.0000 -0.6376 -0.2237; 1.0000 -0.0764  1.0904; 1.0000 -0.0009 -0.2237;
        1.0000 -0.1393 -0.2237; 1.0000  3.1173  2.4045; 1.0000 -0.9220 -0.2237; 1.0000  0.3766  1.0904;
        1.0000 -0.8565 -1.5378; 1.0000 -0.9622 -0.2237; 1.0000  0.7655  1.0904; 1.0000  1.2965  1.0904;
        1.0000 -0.2940 -0.2237; 1.0000 -0.1418 -1.5378; 1.0000 -0.4992 -0.2237; 1.0000 -0.0487  1.0904;
        1.0000  2.3774 -0.2237; 1.0000 -1.1334 -0.2237; 1.0000 -0.6829 -0.2237; 1.0000  0.6610 -0.2237;
        1.0000  0.2508 -0.2237; 1.0000  0.8007 -0.2237; 1.0000 -0.2034 -1.5378; 1.0000 -1.2592 -2.8519;
        1.0000  0.0495  1.0904; 1.0000  1.4299 -0.2237; 1.0000 -0.2387  1.0904; 1.0000 -0.7093 -0.2237;
        1.0000 -0.9584 -0.2237; 1.0000  0.1652  1.0904; 1.0000  2.7864  1.0904; 1.0000  0.2030  1.0904;
        1.0000 -0.4237 -1.5378; 1.0000  0.2986 -0.2237; 1.0000  0.7126  1.0904; 1.0000 -1.0075 -0.2237;
        1.0000 -1.4454 -1.5378; 1.0000 -0.1871  1.0904; 1.0000 -1.0037 -0.2237];
    y = [399900 329900 369000 232000 539900 299900 314900 198999 212000 242500 239999 347000 329999,...
        699900 259900 449900 299900 199900 499998 599000 252900 255000 242900 259900 573900 249900,...
        464500 469000 475000 299900 349900 169900 314900 579900 285900 249900 229900 345000 549000,...
        287000 368500 329900 314000 299000 179900 299900 239500]';
    
    alpha = 0.01;
    num_iters = 200;
    
    % Init Theta and Run Gradient Descent
    theta0 = zeros(3, 1);
    [theta_result_1, J_history_1] = gradientDescentMulti(X, y, theta0, alpha, num_iters, 1);
    [theta_result_2, J_history_2] = gradientDescentMulti(X, y, theta0, alpha, num_iters, 2);
    
    % Plot the convergence graph for both methods
    figure;
    x = 1:numel(J_history_1);
    subplot(5,1,1:4);
    plot(x,J_history_1,x,J_history_2);
    xlim([min(x) max(x)]);
    set(gca,'XTickLabel','');
    ylabel('Cost J');
    grid on;
    
    subplot(5,1,5);
    stem(x,(J_history_1-J_history_2)./J_history_1,'ko');
    xlim([min(x) max(x)]);
    xlabel('Number of iterations');
    ylabel('frac. \DeltaJ');
    grid on;
    
    % Display gradient descent's result
    fprintf('Theta computed from gradient descent with method 1: \n');
    fprintf(' %f \n', theta_result_1);
    fprintf('Theta computed from gradient descent with method 2: \n');
    fprintf(' %f \n', theta_result_2);
    fprintf('\n');
    

    function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters, METHOD)
    % Initialize some useful values
    m = length(y); % number of training examples
    n = length(theta);
    J_history = zeros(num_iters, 1);
    
    costs = zeros(n,1);
    for iter = 1:num_iters
    
        if METHOD == 1 % code 1 - does work
            for c = 1:n
                costs(c) = 0;
                for i = 1:m
                    costs(c) = costs(c) + (X(i,:)*theta - y(i)) *X(i,c);
                end
            end
        elseif METHOD == 2 % code 2 - does work
            E = X * theta - y;
            for c = 1:n
                costs(c) = sum(E.*X(:,c));
            end
        else
            error('unknown method');
        end
    
        % update each theta
        for c = 1:n
            theta(c) = theta(c) - alpha*costs(c)/m;
        end
        J_history(iter) = computeCostMulti(X, y, theta);
    end
    end
    

    function J = computeCostMulti(X, y, theta)
    m = length(y); J = 0;
    for mi = 1:m
        J = J + (X(mi,:)*theta - y(mi))^2;
    end
    J = J/(2*m);
    end
    

    But, again, you really just need to add the cost(c) = 0; line.

    Also; I'd recommend always adding the close all; clear; clc; line at the beginning of your scripts to make sure they'll work if you copy and paste them into stack overflow.