I'm trying to interpret the results I'm getting when trying to cross validate my data for a k-nearest neighbors model. My data set is set up like
variable1(int) | variable2(int) | variable3(int) | variable4 (int) | Response (factor)
I split my data 80% into cvdata and 20% for testing once I choose my model.
A single iteration for my code is below:
cv <- cv.kknn(formula = Response~., cvdata, kcv = 10, k = 7, kernel = 'optimal', scale = TRUE)
cv
When I run 'cv' it just returns a list() containing some seemingly random numbers as the rownames, the observed outcome variable (y) and predicted outcome variable (yhat). I'm trying to calculate some sort of accuracy to the test set. Should I be comparing y to yhat to validate?
EDIT: output added below
[[1]]
y yhat
492 1 0.724282776
654 0 0.250394372
427 0 0.125159894
283 0 0.098561768
218 1 0.409990851
[[2]]
[1] 0.2267058 0.1060212
The first element in [[2]] is mean absolute error, the second is mean squared error. Suppose df is your dataframe, then those values can be easily tested by mean(abs(df$y - df$yhat)) and mean((df$y - df$yhat)^2).