Search code examples
matlabmachine-learninglinear-regressiongradient-descent

Gradient Descent and Closed Form Solution - Different Hypothesis Lines in MATLAB


I'm in the process of coding what I'm learning about Linear Regression from the coursera Machine Learning course (MATLAB). There was a similar post that I found here, but I don't seem to be able to understand everything. Perhaps because my fundamentals in Machine Learning are a bit weak.

The problem I'm facing is that, for some data... both gradient descent (GD) and the Closed Form Solution (CFS) give the same hypothesis line. However, on one particular dataset, the results are different. I've read something about, that, if the data is singular, then results should be the same. However, I have no idea how to check whether or not my data is singular.

I will try to illustrate the best I can:

1) Firstly, here is the MATLAB code adapted from here. For the given dataset, everything turned out good where both GD and the CFS gave similar results.

The Dataset

X                   Y
2.06587460000000    0.779189260000000
2.36840870000000    0.915967570000000
2.53999290000000    0.905383540000000
2.54208040000000    0.905661380000000
2.54907900000000    0.938988900000000
2.78668820000000    0.966847400000000
2.91168250000000    0.964368240000000
3.03562700000000    0.914459390000000
3.11466960000000    0.939339440000000
3.15823890000000    0.960749710000000
3.32759440000000    0.898370940000000
3.37931650000000    0.912097390000000
3.41220060000000    0.942384990000000
3.42158230000000    0.966245780000000
3.53157320000000    1.05265000000000
3.63930020000000    1.01437910000000
3.67325370000000    0.959694260000000
3.92564620000000    0.968537160000000
4.04986460000000    1.07660650000000
4.24833480000000    1.14549780000000
4.34400520000000    1.03406250000000
4.38265310000000    1.00700090000000
4.42306020000000    0.966836480000000
4.61024430000000    1.08959190000000
4.68811830000000    1.06344620000000
4.97773330000000    1.12372390000000
5.03599670000000    1.03233740000000
5.06845360000000    1.08744520000000
5.41614910000000    1.07029880000000
5.43956230000000    1.16064930000000
5.45632070000000    1.07780370000000
5.56984580000000    1.10697580000000
5.60157290000000    1.09718750000000
5.68776170000000    1.16486030000000
5.72156020000000    1.14117960000000
5.85389140000000    1.08441560000000
6.19780260000000    1.12524930000000
6.35109410000000    1.11683410000000
6.47970330000000    1.19707890000000
6.73837910000000    1.20694620000000
6.86376860000000    1.12510460000000
7.02233870000000    1.12356720000000
7.07823730000000    1.21328290000000
7.15142320000000    1.25226520000000
7.46640230000000    1.24970650000000
7.59738740000000    1.17997060000000
7.74407170000000    1.18972990000000
7.77296620000000    1.30299340000000
7.82645140000000    1.26011340000000
7.93063560000000    1.25622670000000

My MATLAB code:

clear all; close all; clc; 
x = load('ex2x.dat');
y = load('ex2y.dat');

m = length(y); % number of training examples

% Plot the training data
figure; % open a new figure window
plot(x, y, '*r');
ylabel('Height in meters')
xlabel('Age in years')

% Gradient descent
x = [ones(m, 1) x]; % Add a column of ones to x
theta = zeros(size(x(1,:)))'; % initialize fitting parameters
MAX_ITR = 1500;
alpha = 0.07;

for num_iterations = 1:MAX_ITR

    thetax = x * theta;
    % for theta_0 and x_0
    grad0 = (1/m) .* sum( x(:,1)' * (thetax - y));
    % for theta_0 and x_0
    grad1 = (1/m) .* sum( x(:,2)' * (thetax - y));

    % Here is the actual update
    theta(1) = theta(1) - alpha .* grad0;
    theta(2) = theta(2) - alpha .* grad1;
end
% print theta to screen
theta

% Plot the hypothesis (a.k.a. linear fit)
hold on
plot(x(:,2), x*theta, 'ob')
% Plot using the Closed Form Solution
plot(x(:,2), x*((x' * x)\x' * y), '--r')
legend('Training data', 'Linear regression', 'Closed Form')
hold off % don't overlay any more plots on this figure''

[EDIT: Sorry for the wrong labeling... It's not Normal Equation, but Closed Form Solution. My mistake] The results for this code is as shown below (Which is peachy :D Same results for both GD and CFS) - Same results for Gradient Descent and Closed Form Solution

  1. Now, I am testing my code with another dataset. The URL for the dataset is here - GRAY KANGAROOS. I converted it to CSV and read it into MATLAB. Note that I did scaling (divided by the maximum, since if I didn't do that, no hypothesis line appears at all and the thetas come out as Not A Number (NaN) in MATLAB).

The Gray Kangaroo Dataset:

X    Y
609 241
629 222
620 233
564 207
645 247
493 189
606 226
660 240
630 215
672 231
778 263
616 220
727 271
810 284
778 279
823 272
755 268
710 278
701 238
803 255
855 308
838 281
830 288
864 306
635 236
565 204
562 216
580 225
596 220
597 219
636 201
559 213
615 228
740 234
677 237
675 217
629 211
692 238
710 221
730 281
763 292
686 251
717 231
737 275
816 275

The changes I made to the code to read in this dataset

    dataset = load('kangaroo.csv');
    % scale?
    x = dataset(:,1)/max(dataset(:,1));
    y = dataset(:,2)/max(dataset(:,2));

The results that came out was like this: [EDIT: Sorry for the wrong labeling... It's not Normal Equation, but Closed Form Solution. My mistake]

Different results for GD and CFS

I was wondering if there is any explanation for this discrepancy? Any help would be much appreciate. Thank you in advance!


Solution

  • I haven't run your code, but let me trow you some theory:

    If your code is right (it looks like it): Increase MAX_ITER and it will look better.

    Gradient descend is not ensured to converge at MAX_ITER, and actually gradient descend is a quite slow method (convergence-wise).

    The convergence of Gradient descend for a "standard" convex function (like the one you try to solve) looks like this (from the Internets):

    https://spin.atomicobject.com/wp-content/uploads/gradient_descent_error_by_iteration.png

    Forget, about iteration number, as it depedns in the problem, and focus in the shape. What may be happening is that your maxiter falls somewhere like "20" in this image. Thus your result is good, but not the best!

    However, solving the normal equations directly will give you the minimums square error solution. (I assume normal equation you mean x=(A'*A)^(-1)*A'*b). The problem is that there are loads of cases where you can not store A in memory, or in an ill-posed problem, the normal equation will lead to ill-conditioned matrices that will be numerically unstable, thus gradient descend is used.

    more info