I'm currently taking Andrew Ng's machine learning course and I try implementing the stuff as I learn so as not to forget them, I just finished regularization (chapter 7). I know that theta 0 is updated normally, separate from other parameters, however, I am not sure which of these is the correct implementation.
Implementation 1: in my gradient function, after computing the regularization vector, change theta 0 part to 0 so when it is added to the total, it is as if theta 0 was never regularized.
Implementation 2: store theta in a temp variable: _theta, update it with a reg_step of 0 (so it's as if there's no regularization), store the new theta 0 in a temp variable: t1, then update the original theta value with my desired reg_step and replace theta 0 with t1 (value from non-regularized update).
below is my code for the first implementation, it's not meant to be advanced, I'm just practicing: I'm using octave which is 1-index, so theta(1) is theta(0)
function ret = gradient(X,Y,theta,reg_step),
H = theta' * X;
dif = H-Y;
mul = dif .* X;
total = sum(mul,2);
m=(size(Y)(1,1));
regular = (reg_step/m)*theta;
regular(1)=0;
ret = (total/m)+regular,
endfunction
Thanks in advance.
A slight tweak to the first implementation worked for me.
First, calculate regularization for every theta. Then go on to perform gradient step and later you can change the first entry of the matrix containing gradients manually to ignore regularization for theta_0.
% Calculate regularization
regularization = (reg_step / m) * theta;
% Gradient Step
gradients = (1 / m) * (X' * (predictions - y)) + regularization;
% Ignore regularization in theta_0
gradients(1) = (1 / m) * (X(:, 1)' * (predictions - y));