Search code examples
rcross-validationr-caretknn

Knn using Cross Validation function


I need to run the R code to find the number of folder = 1 for k=(c(1:12)) but the following warnings were displayed:

> warnings()
Mensagens de aviso:
1: model fit failed for Fold1.Rep1: k= 1 Error in x[1, 1] : subscript out of bounds

2: model fit failed for Fold1.Rep1: k= 2 Error in x[1, 1] : subscript out of bounds

3: model fit failed for Fold1.Rep1: k= 3 Error in x[1, 1] : subscript out of bounds

. . .

12: model fit failed for Fold1.Rep1: k=12 Error in x[1, 1] : subscript out of bounds

13: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  ... :
  There were missing values in resampled performance measures.

This is a R code using caret package.

biopsy_final = na.omit(biopsy[,-c(1)]) # ID & NA excluded  

ctrl <- trainControl(method="repeatedcv", number=1, repeats=1)
nn_grid <- expand.grid(k=c(1:12))
nn_grid

best_knn <- train(class~., data=biopsy_,
              method="knn",
              trControl=ctrl, 
              preProcess = c("center", "scale"),  # standardize
              tuneGrid=nn_grid)
print(best_knn)

Solution

  • Try this.

    grid <- expand.grid(k = 1:12)
    
    {
      set.seed(1)
    
      index <- caret::createDataPartition(biopsy_$class, p = 0.75, list = FALSE) # partiotion test-train
    
      train <- biopsy_[index, ]
      test  <- biopsy_[-index, ]
    
      ctrl <- caret::trainControl(method  = "repeatedcv", 
                                  number  = 10, # see this
                                  repeats = 10   # see this
                                  )  
    
      model <- caret::train(class~., 
                            data = train, 
                            method = "knn",
                            trControl = ctrl,
                            preProcess = c("center","scale"),
                            tuneGrid = grid)
    }
    
    # plot(model)
    # model$bestTune # best k
    
    # library(dplyr)
    # predictions <- model %>% predict(test)
    # RMSE(predictions, test$class)