matlab machine-learning linear-regression gradient-descent

Gradient Descent and Closed Form Solution - Different Hypothesis Lines in MATLAB

I'm in the process of coding what I'm learning about Linear Regression from the coursera Machine Learning course (MATLAB). There was a similar post that I found here, but I don't seem to be able to understand everything. Perhaps because my fundamentals in Machine Learning are a bit weak.

The problem I'm facing is that, for some data... both gradient descent (GD) and the Closed Form Solution (CFS) give the same hypothesis line. However, on one particular dataset, the results are different. I've read something about, that, if the data is singular, then results should be the same. However, I have no idea how to check whether or not my data is singular.

I will try to illustrate the best I can:

1) Firstly, here is the MATLAB code adapted from here. For the given dataset, everything turned out good where both GD and the CFS gave similar results.

The Dataset

X                   Y
2.06587460000000    0.779189260000000
2.36840870000000    0.915967570000000
2.53999290000000    0.905383540000000
2.54208040000000    0.905661380000000
2.54907900000000    0.938988900000000
2.78668820000000    0.966847400000000
2.91168250000000    0.964368240000000
3.03562700000000    0.914459390000000
3.11466960000000    0.939339440000000
3.15823890000000    0.960749710000000
3.32759440000000    0.898370940000000
3.37931650000000    0.912097390000000
3.41220060000000    0.942384990000000
3.42158230000000    0.966245780000000
3.53157320000000    1.05265000000000
3.63930020000000    1.01437910000000
3.67325370000000    0.959694260000000
3.92564620000000    0.968537160000000
4.04986460000000    1.07660650000000
4.24833480000000    1.14549780000000
4.34400520000000    1.03406250000000
4.38265310000000    1.00700090000000
4.42306020000000    0.966836480000000
4.61024430000000    1.08959190000000
4.68811830000000    1.06344620000000
4.97773330000000    1.12372390000000
5.03599670000000    1.03233740000000
5.06845360000000    1.08744520000000
5.41614910000000    1.07029880000000
5.43956230000000    1.16064930000000
5.45632070000000    1.07780370000000
5.56984580000000    1.10697580000000
5.60157290000000    1.09718750000000
5.68776170000000    1.16486030000000
5.72156020000000    1.14117960000000
5.85389140000000    1.08441560000000
6.19780260000000    1.12524930000000
6.35109410000000    1.11683410000000
6.47970330000000    1.19707890000000
6.73837910000000    1.20694620000000
6.86376860000000    1.12510460000000
7.02233870000000    1.12356720000000
7.07823730000000    1.21328290000000
7.15142320000000    1.25226520000000
7.46640230000000    1.24970650000000
7.59738740000000    1.17997060000000
7.74407170000000    1.18972990000000
7.77296620000000    1.30299340000000
7.82645140000000    1.26011340000000
7.93063560000000    1.25622670000000

My MATLAB code:

clear all; close all; clc; 
x = load('ex2x.dat');
y = load('ex2y.dat');

m = length(y); % number of training examples

% Plot the training data
figure; % open a new figure window
plot(x, y, '*r');
ylabel('Height in meters')
xlabel('Age in years')

% Gradient descent
x = [ones(m, 1) x]; % Add a column of ones to x
theta = zeros(size(x(1,:)))'; % initialize fitting parameters
MAX_ITR = 1500;
alpha = 0.07;

for num_iterations = 1:MAX_ITR

    thetax = x * theta;
    % for theta_0 and x_0
    grad0 = (1/m) .* sum( x(:,1)' * (thetax - y));
    % for theta_0 and x_0
    grad1 = (1/m) .* sum( x(:,2)' * (thetax - y));

    % Here is the actual update
    theta(1) = theta(1) - alpha .* grad0;
    theta(2) = theta(2) - alpha .* grad1;
end
% print theta to screen
theta

% Plot the hypothesis (a.k.a. linear fit)
hold on
plot(x(:,2), x*theta, 'ob')
% Plot using the Closed Form Solution
plot(x(:,2), x*((x' * x)\x' * y), '--r')
legend('Training data', 'Linear regression', 'Closed Form')
hold off % don't overlay any more plots on this figure''

[EDIT: Sorry for the wrong labeling... It's not Normal Equation, but Closed Form Solution. My mistake] The results for this code is as shown below (Which is peachy :D Same results for both GD and CFS) -

Now, I am testing my code with another dataset. The URL for the dataset is here - GRAY KANGAROOS. I converted it to CSV and read it into MATLAB. Note that I did scaling (divided by the maximum, since if I didn't do that, no hypothesis line appears at all and the thetas come out as Not A Number (NaN) in MATLAB).

The Gray Kangaroo Dataset:

The changes I made to the code to read in this dataset

    dataset = load('kangaroo.csv');
    % scale?
    x = dataset(:,1)/max(dataset(:,1));
    y = dataset(:,2)/max(dataset(:,2));

The results that came out was like this: [EDIT: Sorry for the wrong labeling... It's not Normal Equation, but Closed Form Solution. My mistake]

I was wondering if there is any explanation for this discrepancy? Any help would be much appreciate. Thank you in advance!

Solution

I haven't run your code, but let me trow you some theory:

If your code is right (it looks like it): Increase MAX_ITER and it will look better.

Gradient descend is not ensured to converge at MAX_ITER, and actually gradient descend is a quite slow method (convergence-wise).

The convergence of Gradient descend for a "standard" convex function (like the one you try to solve) looks like this (from the Internets):

Forget, about iteration number, as it depedns in the problem, and focus in the shape. What may be happening is that your maxiter falls somewhere like "20" in this image. Thus your result is good, but not the best!

However, solving the normal equations directly will give you the minimums square error solution. (I assume normal equation you mean x=(A'*A)^(-1)*A'*b). The problem is that there are loads of cases where you can not store A in memory, or in an ill-posed problem, the normal equation will lead to ill-conditioned matrices that will be numerically unstable, thus gradient descend is used.

more info