I have a test set with missing values in the class variable. When running knn I get the error message:
"Error in knn (...) no missing values are allowed".
Question: Why doesn't knn allow the values of the class variable in the test set to be missing? I mean, I don't know those values, I want to predict them. Can I just assign some class to the class variable and still get the correct result?
Example code:
library(class)
data <- data.frame("class_variable"=sample(LETTERS[1:2], 30, replace = TRUE),
"predictor_1" = runif(30),
"predictor_2" = runif(30))
train <- data[1:20,]
test <- data[21:30,]
test$class_variable <- NA
knn(train, test, train$class_variable)
Error in knn(train, test, train$class_variable) : no missing values are allowed
train
and test
have to have the same number of columns, and there cannot be NA values in the data. So the way to do it is, exclude the class_variable
column from both train
and test
when passing them to knn
. This will work:
knn(train[, -1], test[, -1], train[, 1])