I'm trying to figure out gradient descent with Octave. With each iteration, my thetas get exponentially larger. I'm not sure what the issue is as I'm copying another function directly.
Here are my matrices:
X = 1 98
1 94
1 93
1 88
1 84
1 82
1 79
y = 97
94
94
78
85
85
76
theta = 1
1
I'm using this formula:
theta = theta - 0.001 * (1 / 7) * (X' * (X * theta - y))
I figured out what the optimal thetas are using the normal equation, but after only a few iterations my thetas are in the several thousands. Any idea what's wrong?
You seem to be using gradient descent for linear regression, where your learning rate is too high, as mentioned in the previous answers too, this post is just to add some visualization and explain exactly what is happening in your case.
As shown in the below figure, the learning rate is high enough to converge to the global minimum in the convex cost surface and theta values oscillate and miss the minimum point, for the steps being too large (as shown in the RHS figure). If you decrease your learning rate (as in LHS) the convergence rate will be lower, but eventually you will reach the global minimum.
You need to find an alpha (learning rate) that is just right, so that the convergence rate is not too slow or too high (that will depend upon data, scaling the features will help).