This is the first time I ask a question.
I'm trying to find the best K
when running Knn
but the code I got from the professor seems not to be displaying the result of the best K
and Rmse
.
Bellow is what I typed on the console. I appreciate the help!
#rm(list=ls())
gc()
#setwd('/******/Desktop/Applied/isds 574/R')
dat = read.csv('cleaned.csv', stringsAsFactors=T, head=T)
#dropping Longtitude and Latitude
dat$longitude = NULL
dat$latitude = NULL
dat$X = NULL
#Factors
dat$ocean_proxy_dummy = as.factor(dat$ocean_proxy_dummy)
# divide the data into 2 sets: training and validation
set.seed(1)
id.train = sample(1:nrow(dat), nrow(dat)*.6)
id.test = setdiff(1:nrow(dat), id.train)
#KNN2
library(FNN)
Knn.reg.bestK = function(xtrain, xtest, ytrain, ytest, Kmax = 10) {
vec.rmse = rep(NA, Kmax)
for (K in 1:Kmax) {
yhat.test = Knn.reg(xtrain, xtest, ytrain, k)$pred
vec.rmse[K] = rmse(yhat.test, ytest)
}
list(K.opt = which.min(vec.rmse), rmse.min = min(vec.rmse), vec.rmse)
}
You got this code from your professor, so I'm not sure what the intent of the exercise is. I will tell you that to actually see the results of the function Knn.reg.bestK()
you will need to call that function on your data.
Knn.reg.bestK(xtrain, xtest, ytrain, ytest)
But before you do that, you need to set those variables names to the appropriate values.
xtrain = id.train$variables #This is a dataframe of predictors
ytrain = id.train$response #This is your outcome variable