machine-learning octave logistic-regression gradient-descent

logistic regression with gradient descent error

I am trying to implement logistic regression with gradient descent,

I get my Cost function j_theta for the number of iterations and fortunately my j_theta is decreasing when plotted j_theta against the number of iteration.

The data set I use is given below:

x=
1   20   30
1   40   60
1   70   30
1   50   50
1   50   40
1   60   40
1   30   40
1   40   50
1   10   20
1   30   40
1   70   70

y=   0
     1
     1
     1
     0
     1
     0
     0
     0
     0
     1

The code that I managed to write for logistic regression using Gradient descent is:

%1. The below code would load the data present in your desktop to the octave memory 
x=load('stud_marks.dat');
%y=load('ex4y.dat');
y=x(:,3);
x=x(:,1:2);


%2. Now we want to add a column x0 with all the rows as value 1 into the matrix.
%First take the length
[m,n]=size(x);
x=[ones(m,1),x];

X=x;


%   Now we limit the x1 and x2 we need to leave or skip the first column x0 because they     should stay as 1.
mn = mean(x);
sd = std(x);
x(:,2) = (x(:,2) - mn(2))./ sd(2);
x(:,3) = (x(:,3) - mn(3))./ sd(3);

% We will not use vectorized technique, Because its hard to debug, We shall try using many for loops rather

max_iter=50;

theta = zeros(size(x(1,:)))'; 
j_theta=zeros(max_iter,1);         

for num_iter=1:max_iter
  % We calculate the cost Function
  j_cost_each=0;
  alpha=1;
  theta
    for i=1:m
        z=0;
        for j=1:n+1
%            theta(j)
            z=z+(theta(j)*x(i,j));  
            z
        end
        h= 1.0 ./(1.0 + exp(-z));
        j_cost_each=j_cost_each + ( (-y(i) * log(h)) -  ((1-y(i)) * log(1-h)) );  
%       j_cost_each
    end  
    j_theta(num_iter)=(1/m) * j_cost_each;

    for j=1:n+1
        grad(j) = 0;
        for i=1:m
            z=(x(i,:)*theta);  
            z            
            h=1.0 ./ (1.0 + exp(-z));
            h
            grad(j) += (h-y(i)) * x(i,j); 
        end
        grad(j)=grad(j)/m;
        grad(j)
        theta(j)=theta(j)- alpha * grad(j);
    end
end      

figure
plot(0:1999, j_theta(1:2000), 'b', 'LineWidth', 2)
hold off


figure
%3. In this step we will plot the graph for the given input data set just to see how is the distribution of the two class.
pos = find(y == 1);  % This will take the postion or array number from y for all the class     that has value 1 
neg = find(y == 0);  % Similarly this will take the position or array number from y for all     class that has value 0
 % Now we plot the graph column x1 Vs x2 for y=1 and y=0
plot(x(pos, 2), x(pos,3), '+'); 
hold on
plot(x(neg, 2), x(neg, 3), 'o');
xlabel('x1 marks in subject 1')
ylabel('y1 marks in subject 2')
legend('pass', 'Failed')


plot_x = [min(x(:,2))-2,  max(x(:,2))+2];     % This min and max decides the length of the decision graph.
% Calculate the decision boundary line
plot_y = (-1./theta(3)).*(theta(2).*plot_x +theta(1));
plot(plot_x, plot_y)
hold off

%%%%%%% The only difference is In the last plot I used X where as now I use x whose attributes or features are featured scaled %%%%%%%%%%%

If you view the graph of x1 vs x2 the graph would look like,

enter image description here

After I run my code I create a decision boundary. The shape of the decision line seems to be okay but it is a bit displaced. The graph of the x1 vs x2 with decision boundary is given below:

![enter image description here][2]

Please suggest me where am I going wrong ....

Thanks:)

The New Graph::::

![enter image description here][1]


If you see the new graph the coordinated of x axis have changed ..... Thats because I use x(feature scalled) instead of X.

Solution

The problem lies in your cost function calculation and/or gradient calculation, your plotting function is fine. I ran your dataset on the algorithm I implemented for logistic regression but using the vectorized technique because in my opinion it is easier to debug. The final values I got for theta were

theta = [-76.4242, 0.8214, 0.7948] I also used alpha = 0.3

I plotted the decision boundary and it looks fine, I would recommend using the vectorized form as it is easier to implement and to debug in my opinion.

Decision Boundary

I also think your implementation of gradient descent is not quite correct. 50 iterations is just not enough and the cost at the last iteration is not good enough. Maybe you should try to run it for more iterations with a stopping condition. Also check this lecture for optimization techniques. https://class.coursera.org/ml-006/lecture/37