Search code examples
machine-learningoctavelogistic-regressiongradient-descent

Logistic Regression using Gradient Descent with OCTAVE


I've gone through few courses of Professor Andrew for machine Learning and viewed the transcript for Logistic Regression using Newton's method. However when implementing the logistic regression using gradient descent I face certain issue.

The graph generated is not convex.

My code goes as follows:

I am using the vectorized implementation of the equation.

%1. The below code would load the data present in your desktop to the octave memory 
x=load('ex4x.dat');
y=load('ex4y.dat');

%2. Now we want to add a column x0 with all the rows as value 1 into the matrix.
%First take the length
m=length(y);
x=[ones(m,1),x];

alpha=0.1;
max_iter=100;
g=inline('1.0 ./ (1.0 + exp(-z))');

theta = zeros(size(x(1,:)))';   % the theta has to be a 3*1 matrix so that it can multiply by x that is m*3 matrix
j=zeros(max_iter,1);            % j is a zero matrix that is used to store the theta cost function j(theta)

for num_iter=1:max_iter
    %  Now we calculate the hx or hypothetis, It is calculated here inside no. of iteration because the hupothesis has to be calculated for new theta for every iteration
         z=x*theta;
         h=g(z);     % Here the effect of inline function we used earlier will reflect


     j(num_iter)=(1/m)*(-y'* log(h) - (1 - y)'*log(1-h)) ;    % This formula is the vectorized form of the cost function J(theta) This calculates the cost function
         j       
         grad=(1/m) *  x' * (h-y);     % This formula is the gradient descent formula that calculates the theta value.  
         theta=theta - alpha .* grad;          % Actual Calculation for theta
         theta
 end

The code per say doesn't give any error but does not produce proper convex graph.

I shall be glad if any body could point out the mistake or share insight on what's causing the problem.

thanks enter image description here


Solution

  • 2 things you need to look into:

    1. Machine Learning involves learning patterns from data. If your files ex4x.dat and ex4y.dat are randomly generated, it won't have patterns that you can learn.
    2. You have used variables like g, h, i, j which make debugging difficult. Since it's a very small program, it might be a better idea to rewrite it.

    Here's my code that gives the convex plot

    clc; clear; close all;
    
    load q1x.dat;
    load q1y.dat;
    
    X = [ones(size(q1x, 1),1) q1x];
    Y = q1y;
    
    m = size(X,1);
    n = size(X,2)-1;
    
    %initialize
    theta = zeros(n+1,1);
    thetaold = ones(n+1,1);
    
    while ( ((theta-thetaold)'*(theta-thetaold)) > 0.0000001 )
        %calculate dellltheta
        dellltheta = zeros(n+1,1);
        for j=1:n+1,
            for i=1:m,
                dellltheta(j,1) = dellltheta(j,1) + [Y(i,1) - (1/(1 + exp(-theta'*X(i,:)')))]*X(i,j);
            end;
        end;
        %calculate hessian
        H = zeros(n+1, n+1);
        for j=1:n+1,
            for k=1:n+1,
                    for i=1:m,
                        H(j,k) = H(j,k) -[1/(1 + exp(-theta'*X(i,:)'))]*[1-(1/(1 + exp(-theta'*X(i,:)')))]*[X(i,j)]*[X(i,k)];
                    end;
            end;
        end;
        thetaold = theta;
        theta = theta - inv(H)*dellltheta;
        (theta-thetaold)'*(theta-thetaold)
    end
    

    I get the following values of error after iterations:

    2.8553
    0.6596
    0.1532
    0.0057
    5.9152e-06
    6.1469e-12
    

    Which when plotted looks like: Error vs iteration