I'm new to the Tidymodels framework and want to use nearest_neighbor() function across multiple K values e.g.
c(3,5,8,11),
but I don't know how to do that and at what stage of the whole process I should specify that (if it is at all possible).
I tried
nearest_neighbor(neighbors = c(3,5,8,11))
but neighbors must be a length 1 positive integer
I'd use the tuning functions built into tidymodels (take a look at the chapter in the tidymodels book). See the example below.
There is also the usemodels
package, which will write out proper code for your specific data set.
example:
library(tidymodels)
tidymodels_prefer()
knn_spec <- nearest_neighbor(neighbors = tune()) %>% set_mode("regression")
set.seed(1)
sim_dat <- sim_regression(100)
sim_rs <- vfold_cv(sim_dat)
set.seed(2)
knn_res <-
knn_spec %>%
tune_grid(outcome ~ ., resamples = sim_rs, grid = tibble(neighbors = c(3, 5, 8, 11)))
show_best(knn_res, metric = "rmse")
#> # A tibble: 4 × 7
#> neighbors .metric .estimator mean n std_err .config
#> <dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
#> 1 11 rmse standard 20.1 10 2.29 Preprocessor1_Model4
#> 2 8 rmse standard 20.2 10 2.27 Preprocessor1_Model3
#> 3 5 rmse standard 20.7 10 2.24 Preprocessor1_Model2
#> 4 3 rmse standard 22.0 10 2.12 Preprocessor1_Model1
Created on 2023-03-17 with reprex v2.0.2