Hi guys I need help truble shooting the fucntion below. I am using R language.
The dataset i am using is called wages and it is from a package called library(ISLR) data(wages).
Anyhow, I am trying to develop a function that allows me to perform k-fold cross-validation on any general linear models.
The inputs/arguments to the function i am using are function(numberOfFolds, y,x,InputData)
y is the dependent variable x is all the other variables in the dataset inputdata is the dataset of wages numberOfFolds is k basically.
I have developed the below code but i am getting NaN values. Not sure what is going on wrong! Could someone please help
my.k.fold.1<- function(numberOfFolds, y,x,inputData){
index<-sample(1:numberOfFolds, nrow(inputData), replace = T)
inputData$index<-index
mse<-vector('numeric', length = numberOfFolds)
for (n in 1:numberOfFolds) {
data.train<-inputData[index!=n,]
data.test<-inputData[index==n,]
my.equation<-paste(y,paste(x, collapse = '+'),sep='~')
formula.1<-formula(my.equation)
model.test<-lm(formula.1, data = data.train)
predictions<-predict(model.test, newdata=data.test)
mse[[n]]<-mean((data.test$y-predictions)^2)
}
return(mse)
}
my.k.fold.1(numberOfFolds = 5, y='earn', x=c('race', 'sex', 'ed', 'height', 'age'), inputData = wages)
i would like to keep the arguments the same and i can write down the column names in the y and xs
This is because the y
variable is a string, so data.test$y
is equivalent to data.test[["y"]]
. You should replace it with data.test[[y]]
, which is equivalent to data.test$earn
if y="earn"
:
my.k.fold.1<- function(numberOfFolds, y,x,inputData){
index<-sample(1:numberOfFolds, nrow(inputData), replace = T)
inputData$index<-index
mse<-vector('numeric', length = numberOfFolds)
for (n in 1:numberOfFolds) {
data.train<-inputData[index!=n,]
data.test<-inputData[index==n,]
my.equation<-paste(y,paste(x, collapse = '+'),sep='~')
formula.1<-formula(my.equation)
model.test<-lm(formula.1, data = data.train)
predictions<-predict(model.test, newdata=data.test)
mse[[n]]<-mean((data.test[[y]]-predictions)^2)
}
return(mse)
}