I am trying to run a linear regression using fminunc to optimize my parameters. However, while the code never fails, the fminunc function seems to only be running once and not converging. The exit flag that the fminunc funtion returns is -3, which - according to documentation- means "The trust region radius became excessively small". What does this mean and how can I fix it?
This is my main:
load('data.mat');
% returns matrix X, a matrix of data
% Initliaze parameters
[m, n] = size(X);
X = [ones(m, 1), X];
initialTheta = zeros(n + 1, 1);
alpha = 1;
lambda = 0;
costfun = @(t) costFunction(t, X, surv, lambda, alpha);
options = optimset('GradObj', 'on', 'MaxIter', 1000);
[theta, cost, info] = fminunc(costfun, initialTheta, options);
And the cost function:
function [J, grad] = costFunction(theta, X, y, lambda, alpha)
%COSTFUNCTION Implements a logistic regression cost function.
% [J grad] = COSTFUNCTION(initialParameters, X, y, lambda) computes the cost
% and the gradient for the logistic regression.
%
m = size(X, 1);
J = 0;
grad = zeros(size(theta));
% un-regularized
z = X * theta;
J = (-1 / m) * y' * log(sigmoid(z)) + (1 - y)' * log(1 - sigmoid(z));
grad = (alpha / m) * X' * (sigmoid(z) - y);
% regularization
theta(1) = 0;
J = J + (lambda / (2 * m)) * (theta' * theta);
grad = grad + alpha * ((lambda / m) * theta);
endfunction
Any help is much appreciated.
There are a few issues with the code above:
Using the fminunc means you don't have to provide an alpha. Remove all instances of it from the code and your gradient functions should look like the following
grad = (1 / m) * X' * (sigmoid(z) - y);
and
grad = grad + ((lambda / m) * theta); % This isn't quite correct, see below
In the regularization of the grad, you can't use theta as you don't add in the theta for j = 0. There are a number ways to do this, but here is one
temp = theta;
temp(1) = 0;
grad = grad + ((lambda / m) * temp);
You missing a set of bracket in your cost function. The (-1 / m) is being applied only to a portion of the rest of the equation. It should look like.
J = (-1 / m) * ( y' * log(sigmoid(z)) + (1 - y)' * log(1 - sigmoid(z)) );
And finally, as a nit, a lambda value of 0 means that your regularization does nothing.