Search code examples
rknn

Error when fitting KNN model


I was going to fit a knn model with faithful data in R. My code is like this:

smp_size <- floor(0.5 * nrow(faithful))
set.seed(123)
train_ind <- sample(seq_len(nrow(faithful)), size = smp_size)
train_data = faithful[train_ind, ]
test_data = faithful[-train_ind, ]

pred = FNN::knn.reg(train = train_data[,1], 
                  test = test_data[,1], 
                  y = train_data[,2], k = 5)$pred

The faithful data only has 2 columns. I met this error "Error in get.knnx(train, test, k, algorithm) : Number of columns must be same!."

I don't understand why the error will come up because the columns of train and test data are the same.

Thanks first for answering my question!


Solution

  • ?knn.reg says that train/test has to be data frame or matrix. But in your case you just have one independent variable so when you do str(train_data[,1]) it is no more a data frame. So the solution is to use as.data.frame with train & test parameters in knn.reg.

    Another important point is that you need to first 'normalize' your data before you run KNN. May be you can try below snippet as a minor improvement to your code:

    library('FNN')
    set.seed(123)
    
    #normalize data
    X = scale(faithful[, -ncol(faithful)])
    y = faithful[, ncol(faithful)]
    
    #split data into train & test
    train_ind <- sample(seq_len(nrow(faithful)), floor(0.7 * nrow(faithful)))
    test_ind <- setdiff(seq_len(nrow(faithful)), train_ind)
    
    #run KNN model
    knn_model <- knn.reg(train = as.data.frame(X[train_ind,]), 
                         test = as.data.frame(X[test_ind,]), 
                         y = y[train_ind], 
                         k = 5)
    pred = knn_model$pred
    


    Hope this helps!