My implementation (see below) gives the scalar value 3.18, which is not the right answer. The value should be 0.693. Where does my code deviate from the equation?
Here are the instructions to solve for the data to run the cost function method in Octave:
data = load('ex2data1.txt');
X = data(:, [1, 2]); y = data(:, 3);
[m, n] = size(X);
X = [ones(m, 1) X];
initial_theta = zeros(n + 1, 1);
[cost, grad] = costFunction(initial_theta, X, y);
Here is the link on ex2data
, in this package there is data: data link.
The formula for the cost function is
Here is the code I am using:
function [J, grad] = costFunction(theta, X, y)
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0; %#ok<NASGU>
grad = zeros(size(theta)); %#ok<NASGU>
hx = sigmoid(X * theta)';
m = length(X);
J = sum(-y' * log(hx) - (1 - y')*log(1 - hx)) / m;
grad = X' * (hx - y) / m;
Here is the sigmoid function:
function g = sigmoid(z)
g = 1/(1+exp(-z));
Your sigmoid
function is incorrect. The incoming data type is a vector but the operations you are using are performing matrix division. This needs to be element-wise.
function g = sigmoid(z)
g = 1.0 ./ (1.0 + exp(-z));
By doing 1 / A
where A
is an expression, you are in fact compute the inverse of A
Since inverses only exist for square matrices, this will compute the pseudo-inverse which is definitely not what you want.
You can keep most of your costFunction
code the same as you're using the dot product. I would get rid of the sum
since that is implied with the dot product. I'll mark my changes with comments:
function [J, grad] = costFunction(theta, X, y)
m = length(y); % number of training examples
% You need to return the following variables correctly
%J = 0; %#ok<NASGU> <-- Don't need to declare this as you'll create the variables later
%grad = zeros(size(theta)); %#ok<NASGU>
hx = sigmoid(X * theta); % <-- Remove transpose
m = length(X);
J = (-y' * log(hx) - (1 - y')*log(1 - hx)) / m; % <-- Remove sum
grad = X' * (hx - y) / m;