I'm in the process of coding what I'm learning about Linear Regression from the coursera Machine Learning course (MATLAB). There was a similar post that I found here, but I don't seem to be able to understand everything. Perhaps because my fundamentals in Machine Learning are a bit weak.
The problem I'm facing is that, for some data... both gradient descent (GD) and the Closed Form Solution (CFS) give the same hypothesis line. However, on one particular dataset, the results are different. I've read something about, that, if the data is singular, then results should be the same. However, I have no idea how to check whether or not my data is singular.
I will try to illustrate the best I can:
1) Firstly, here is the MATLAB code adapted from here. For the given dataset, everything turned out good where both GD and the CFS gave similar results.
The Dataset
X Y
2.06587460000000 0.779189260000000
2.36840870000000 0.915967570000000
2.53999290000000 0.905383540000000
2.54208040000000 0.905661380000000
2.54907900000000 0.938988900000000
2.78668820000000 0.966847400000000
2.91168250000000 0.964368240000000
3.03562700000000 0.914459390000000
3.11466960000000 0.939339440000000
3.15823890000000 0.960749710000000
3.32759440000000 0.898370940000000
3.37931650000000 0.912097390000000
3.41220060000000 0.942384990000000
3.42158230000000 0.966245780000000
3.53157320000000 1.05265000000000
3.63930020000000 1.01437910000000
3.67325370000000 0.959694260000000
3.92564620000000 0.968537160000000
4.04986460000000 1.07660650000000
4.24833480000000 1.14549780000000
4.34400520000000 1.03406250000000
4.38265310000000 1.00700090000000
4.42306020000000 0.966836480000000
4.61024430000000 1.08959190000000
4.68811830000000 1.06344620000000
4.97773330000000 1.12372390000000
5.03599670000000 1.03233740000000
5.06845360000000 1.08744520000000
5.41614910000000 1.07029880000000
5.43956230000000 1.16064930000000
5.45632070000000 1.07780370000000
5.56984580000000 1.10697580000000
5.60157290000000 1.09718750000000
5.68776170000000 1.16486030000000
5.72156020000000 1.14117960000000
5.85389140000000 1.08441560000000
6.19780260000000 1.12524930000000
6.35109410000000 1.11683410000000
6.47970330000000 1.19707890000000
6.73837910000000 1.20694620000000
6.86376860000000 1.12510460000000
7.02233870000000 1.12356720000000
7.07823730000000 1.21328290000000
7.15142320000000 1.25226520000000
7.46640230000000 1.24970650000000
7.59738740000000 1.17997060000000
7.74407170000000 1.18972990000000
7.77296620000000 1.30299340000000
7.82645140000000 1.26011340000000
7.93063560000000 1.25622670000000
My MATLAB code:
clear all; close all; clc;
x = load('ex2x.dat');
y = load('ex2y.dat');
m = length(y); % number of training examples
% Plot the training data
figure; % open a new figure window
plot(x, y, '*r');
ylabel('Height in meters')
xlabel('Age in years')
% Gradient descent
x = [ones(m, 1) x]; % Add a column of ones to x
theta = zeros(size(x(1,:)))'; % initialize fitting parameters
MAX_ITR = 1500;
alpha = 0.07;
for num_iterations = 1:MAX_ITR
thetax = x * theta;
% for theta_0 and x_0
grad0 = (1/m) .* sum( x(:,1)' * (thetax - y));
% for theta_0 and x_0
grad1 = (1/m) .* sum( x(:,2)' * (thetax - y));
% Here is the actual update
theta(1) = theta(1) - alpha .* grad0;
theta(2) = theta(2) - alpha .* grad1;
end
% print theta to screen
theta
% Plot the hypothesis (a.k.a. linear fit)
hold on
plot(x(:,2), x*theta, 'ob')
% Plot using the Closed Form Solution
plot(x(:,2), x*((x' * x)\x' * y), '--r')
legend('Training data', 'Linear regression', 'Closed Form')
hold off % don't overlay any more plots on this figure''
[EDIT: Sorry for the wrong labeling... It's not Normal Equation, but Closed Form Solution. My mistake]
The results for this code is as shown below (Which is peachy :D Same results for both GD and CFS) -
The Gray Kangaroo Dataset:
X Y
609 241
629 222
620 233
564 207
645 247
493 189
606 226
660 240
630 215
672 231
778 263
616 220
727 271
810 284
778 279
823 272
755 268
710 278
701 238
803 255
855 308
838 281
830 288
864 306
635 236
565 204
562 216
580 225
596 220
597 219
636 201
559 213
615 228
740 234
677 237
675 217
629 211
692 238
710 221
730 281
763 292
686 251
717 231
737 275
816 275
The changes I made to the code to read in this dataset
dataset = load('kangaroo.csv');
% scale?
x = dataset(:,1)/max(dataset(:,1));
y = dataset(:,2)/max(dataset(:,2));
The results that came out was like this: [EDIT: Sorry for the wrong labeling... It's not Normal Equation, but Closed Form Solution. My mistake]
I was wondering if there is any explanation for this discrepancy? Any help would be much appreciate. Thank you in advance!
I haven't run your code, but let me trow you some theory:
If your code is right (it looks like it): Increase MAX_ITER
and it will look better.
Gradient descend is not ensured to converge at MAX_ITER
, and actually gradient descend is a quite slow method (convergence-wise).
The convergence of Gradient descend for a "standard" convex function (like the one you try to solve) looks like this (from the Internets):
Forget, about iteration number, as it depedns in the problem, and focus in the shape. What may be happening is that your maxiter falls somewhere like "20" in this image. Thus your result is good, but not the best!
However, solving the normal equations directly will give you the minimums square error solution. (I assume normal equation you mean x=(A'*A)^(-1)*A'*b
). The problem is that there are loads of cases where you can not store A
in memory, or in an ill-posed problem, the normal equation will lead to ill-conditioned matrices that will be numerically unstable, thus gradient descend is used.