Search code examples
rgradient-descent

Error in Gradient Descent Calculation


I tried to write a function to calculate gradient descent for a linear regression model. However the answers I was getting does not match the answers I get using the normal equation method.

My sample data is:

df <- data.frame(c(1,5,6),c(3,5,6),c(4,6,8))

with c(4,6,8) being the y values.

lm_gradient_descent <- function(df,learning_rate, y_col=length(df),scale=TRUE){

n_features <- length(df) #n_features is the number of features in the data set

#using mean normalization to scale features

if(scale==TRUE){

for (i in 1:(n_features)){
  df[,i] <- (df[,i]-mean(df[,i]))/sd(df[,i])
    }
  }
  y_data <- df[,y_col]
  df[,y_col] <- NULL
  par <- rep(1,n_features)
  df <- merge(1,df)
  data_mat <- data.matrix(df)
  #we need a temp_arr to store each iteration of parameter values so that we can do a 
  #simultaneous update
  temp_arr <- rep(0,n_features)
  diff <- 1
  while(diff>0.0000001){
    for (i in 1:(n_features)){
      temp_arr[i] <- par[i]-learning_rate*sum((data_mat%*%par-y_data)*df[,i])/length(y_data)
    }
    diff <- par[1]-temp_arr[1]
    print(diff)
    par <- temp_arr
  }

  return(par)
}

Running this function,

lm_gradient_descent(df,0.0001,,0)

the results I got were

c(0.9165891,0.6115482,0.5652970)

when I use the normal equation method, I get

c(2,1,0).

Hope someone can shed some light on where I went wrong in this function.


Solution

  • You used the stopping criterion

    old parameters - new parameters <= 0.0000001
    

    First of all I think there's an abs() missing if you want to use this criterion (though my ignorance of R may be at fault). But even if you use

    abs(old parameters - new parameters) <= 0.0000001
    

    this is not a good stopping criterion: it only tells you that progress has slowed down, not that it's already sufficiently accurate. Try instead simply to iterate for a fixed number of iterations. Unfortunately it's not that easy to give a good, generally applicable stopping criterion for gradient descent here.