Search code examples
machine-learninglinear-regressiongradient-descent

Gradient descent for more than 2 theta values


Gradient descent algorithm is given as :

enter image description here

(taken from Andres NG coursera course) How should this algorithm be implemented if there are more than 2 theta parameters (feature weights) ?

Should an extra theta value be included :

enter image description here

and repeat until convergence, in other words, until theta0, theta1, theta2 no longer change ?


Solution

  • Maybe convert theta to matrix notation then

     big theta = big theta - alpha/m * sigma(h(big theta(X) - Y) * X . 
    

    Andrew Ng's notation is to make it clear to those less comfortable with matrix notation - which i doubt includes yourself. –

    The matrix formulation - a single equation instead of many ones - may be more clear than the serially/individually depicted equations from the OP. The single matrix formulation shows that effectively the update is an atomic operation across all vectors in the design matrix. It is the responsibility of the underlying linear algebra library to make that "happen" .