Search code examples
rknn

"Error in knn (...) no missing values are allowed", why?


I have a test set with missing values in the class variable. When running knn I get the error message:

"Error in knn (...) no missing values are allowed".

Question: Why doesn't knn allow the values of the class variable in the test set to be missing? I mean, I don't know those values, I want to predict them. Can I just assign some class to the class variable and still get the correct result?

Example code:

library(class)
data <- data.frame("class_variable"=sample(LETTERS[1:2], 30, replace =     TRUE),
               "predictor_1" = runif(30),
               "predictor_2" = runif(30))
train <- data[1:20,]
test <- data[21:30,]

test$class_variable <- NA

knn(train, test, train$class_variable)

Error in knn(train, test, train$class_variable) : no missing values are allowed


Solution

  • train and test have to have the same number of columns, and there cannot be NA values in the data. So the way to do it is, exclude the class_variable column from both train and test when passing them to knn. This will work:

    knn(train[, -1], test[, -1], train[, 1])