Search code examples
machine-learningoctavelinear-regressiongradient-descent

Gradient Descent failing for multiple variables, results in NaN


I am trying to implement gradient descent algorithm to minimize a cost function for multiple linear algorithm. I am using the concepts explained in the machine learning class by Andrew Ng. I am using Octave. However when I try to execute the code it seems to fail to provide the solution as my theta values computes to "NaN". I have attached the cost function code and the gradient descent code. Can someone please help.

Cost function :

function J = computeCostMulti(X, y, theta)

m = length(y); % number of training examples

J = 0;

h=(X*theta);
s= sum((h-y).^2);
J= s/(2*m);

Gradient Descent Code:

function [theta, J_history] = gradientDescentMulti(X, y, theta, alpha, num_iters)

m = length(y); % number of training examples
J_history = zeros(num_iters, 1);

for iter = 1:num_iters

  a= X*theta -y;
  b = alpha*(X'*a);
  theta = theta - (b/m);

  J_history(iter) = computeCostMulti(X, y, theta);  
end

Solution

  • I did found the bug and it was not either in the logic of the cost function or gradient descent function. But indeed in the feature normilization logic and I was accidentally returning the wrong varible and hence it was cauing the output to be "NaN"

    It is dumb mistake :

    What I was doing previously

    mu= mean(a);
    sigma = std(a);
    b=(X.-mu);
    X= b./sigma;
    

    Instead what I shoul be doing

    function [X_norm, mu, sigma] = featureNormalize(X)
    %FEATURENORMALIZE Normalizes the features in X 
    %   FEATURENORMALIZE(X) returns a normalized version of X where
    %   the mean value of each feature is 0 and the standard deviation
    %   is 1. This is often a good preprocessing step to do when
    %   working with learning algorithms.
    
    % You need to set these values correctly
    X_norm = X;
    mu = zeros(1, size(X, 2));
    sigma = zeros(1, size(X, 2));
    
    % ====================== YOUR CODE HERE ======================
    
    
    mu= mean(X);
    sigma = std(X);
    a=(X.-mu);
    X_norm= a./sigma;
    
    % ============================================================
    
    end
    

    So clearly I should be using X_norm insated of X and that is what cauing the code to give wrong output