Search code examples
rknntidymodels

Tidymodels: iterate over values of k values in nearest_neighbor()?


I'm new to the Tidymodels framework and want to use nearest_neighbor() function across multiple K values e.g.

c(3,5,8,11), 

but I don't know how to do that and at what stage of the whole process I should specify that (if it is at all possible).

I tried

nearest_neighbor(neighbors = c(3,5,8,11))

but neighbors must be a length 1 positive integer


Solution

  • I'd use the tuning functions built into tidymodels (take a look at the chapter in the tidymodels book). See the example below.

    There is also the usemodels package, which will write out proper code for your specific data set.

    example:

    library(tidymodels)
    
    
    tidymodels_prefer()
    
    
    knn_spec <- nearest_neighbor(neighbors = tune()) %>% set_mode("regression")
    
    set.seed(1)
    sim_dat <- sim_regression(100)
    sim_rs <- vfold_cv(sim_dat)
    
    set.seed(2)
    knn_res <- 
      knn_spec %>% 
      tune_grid(outcome ~ ., resamples = sim_rs, grid = tibble(neighbors = c(3, 5, 8, 11)))
    
    show_best(knn_res, metric = "rmse")
    #> # A tibble: 4 × 7
    #>   neighbors .metric .estimator  mean     n std_err .config             
    #>       <dbl> <chr>   <chr>      <dbl> <int>   <dbl> <chr>               
    #> 1        11 rmse    standard    20.1    10    2.29 Preprocessor1_Model4
    #> 2         8 rmse    standard    20.2    10    2.27 Preprocessor1_Model3
    #> 3         5 rmse    standard    20.7    10    2.24 Preprocessor1_Model2
    #> 4         3 rmse    standard    22.0    10    2.12 Preprocessor1_Model1
    

    Created on 2023-03-17 with reprex v2.0.2