machine-learning octave logistic-regression gradient-descent

How to write cost function formula from Andrew Ng assignment in Octave?

My implementation (see below) gives the scalar value 3.18, which is not the right answer. The value should be 0.693. Where does my code deviate from the equation?

Here are the instructions to solve for the data to run the cost function method in Octave:

data = load('ex2data1.txt');
X = data(:, [1, 2]); y = data(:, 3);
[m, n] = size(X);
X = [ones(m, 1) X];
initial_theta = zeros(n + 1, 1);
[cost, grad] = costFunction(initial_theta, X, y);

Here is the link on ex2data, in this package there is data: data link.

The formula for the cost function is

Here is the code I am using:

function [J, grad] = costFunction(theta, X, y)

m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0; %#ok<NASGU>
grad = zeros(size(theta)); %#ok<NASGU>

hx = sigmoid(X * theta)';
m = length(X);

J = sum(-y' * log(hx) - (1 - y')*log(1 - hx)) / m;

grad = X' * (hx - y) / m;

end

Here is the sigmoid function:

function g = sigmoid(z)
g = 1/(1+exp(-z));
end

Solution

Your sigmoid function is incorrect. The incoming data type is a vector but the operations you are using are performing matrix division. This needs to be element-wise.

function g = sigmoid(z)
    g = 1.0 ./ (1.0 + exp(-z));
end

By doing 1 / A where A is an expression, you are in fact compute the inverse of A Since inverses only exist for square matrices, this will compute the pseudo-inverse which is definitely not what you want.

You can keep most of your costFunction code the same as you're using the dot product. I would get rid of the sum since that is implied with the dot product. I'll mark my changes with comments:

function [J, grad] = costFunction(theta, X, y)

m = length(y); % number of training examples

% You need to return the following variables correctly 
%J = 0; %#ok<NASGU> <-- Don't need to declare this as you'll create the variables later
%grad = zeros(size(theta)); %#ok<NASGU>

hx = sigmoid(X * theta);  % <-- Remove transpose
m = length(X);

J = (-y' * log(hx) - (1 - y')*log(1 - hx)) / m; % <-- Remove sum

grad = X' * (hx - y) / m;

end