I'm following a tutorial from Julia Silge (link here) on using tidymodels and recipes. I can get most of the way through without a problem but when I come to calling the fit_resamples()
function I get the error: Error: The first argument to [fit_resamples()] should be either a model or workflow.
I'm copying the code in the tutorial character for character, and everything runs fine up to and including printing out validation_splits
. But as soon as I call fit_resamples()
I get the error above (link to relevant part of tutorial). If useful, the output of rlang::last_error()
is:
<error/rlang_error>
The first argument to [fit_resamples()] should be either a model or workflow.
Backtrace:
1. tune::fit_resamples(...)
2. tune:::fit_resamples.default(...)
Does anyone know what's going on here? And how I can resolve it? My understanding is that the first argument I pass to fit_resamples()
is a model, i.e. character ~ .
, and i've passed this same model to other functions earlier in the script without issue. See below for code (and data) that leads to the error on my machine, and my sessionInfo().
library(tidyverse)
## Bring in data
hotels <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-02-11/hotels.csv')
hotel_stays <- hotels %>%
filter(is_canceled == 0) %>%
mutate(children = case_when(children + babies > 0 ~ 'children',
TRUE ~ 'none'),
required_car_parking_spaces = case_when(required_car_parking_spaces > 0 ~ 'parking',
TRUE ~ 'none')) %>%
select(-is_canceled, -reservation_status, -babies)
hotels_df <- hotel_stays %>%
select(children, hotel, arrival_date_month, meal, adr, adults,
required_car_parking_spaces, total_of_special_requests,
stays_in_week_nights, stays_in_weekend_nights) %>%
mutate_if(is.character, factor)
## Build models
library(tidymodels)
set.seed(1234)
hotel_split <- initial_split(hotels_df)
hotel_train <- training(hotel_split)
hotel_test <- testing(hotel_split)
hotel_rec <- recipe(children ~ ., data = hotel_train) %>%
step_downsample(children) %>%
step_dummy(all_nominal(), -all_outcomes()) %>%
step_zv(all_numeric()) %>%
step_normalize(all_numeric()) %>%
prep()
test_proc <- bake(hotel_rec, new_data = hotel_test)
knn_spec <- nearest_neighbor() %>%
set_engine('kknn') %>%
set_mode('classification')
knn_fit <- knn_spec %>%
fit(children ~ .,
data=juice(hotel_rec))
knn_fit
## Evaluate models
set.seed(1234)
validation_splits <- mc_cv(juice(hotel_rec), prop = 0.9, strata = children)
validation_splits
## This is where I get the error
knn_res <- fit_resamples(
children ~ .,
knn_spec,
validation_splits,
control = control_resamples(save_pred = TRUE)
)
sessionInfo()
:> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS High Sierra 10.13.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GGally_2.1.2.9000 skimr_2.1.3 silgelib_0.1.1 forcats_0.5.1
[5] stringr_1.4.0 readr_1.4.0 tidyverse_1.3.1 knitr_1.33
[9] yardstick_0.0.8 workflowsets_0.0.2 workflows_0.2.2 tune_0.1.5
[13] tidyr_1.1.3 tibble_3.1.2 rsample_0.1.0 recipes_0.1.16
[17] purrr_0.3.4 parsnip_0.1.6 modeldata_0.1.0 infer_0.5.4
[21] ggplot2_3.3.5 dplyr_1.0.7 dials_0.0.9 scales_1.1.1
[25] broom_0.7.6 tidymodels_0.1.3
loaded via a namespace (and not attached):
[1] colorspace_2.0-1 ellipsis_0.3.2 class_7.3-19 base64enc_0.1-3
[5] fs_1.5.0 rstudioapi_0.13 listenv_0.8.0 furrr_0.2.3
[9] farver_2.1.0 prodlim_2019.11.13 fansi_0.5.0 lubridate_1.7.10
[13] xml2_1.3.2 codetools_0.2-18 splines_4.1.0 jsonlite_1.7.2
[17] pROC_1.17.0.1 dbplyr_2.1.1 shiny_1.6.0 compiler_4.1.0
[21] httr_1.4.2 backports_1.2.1 assertthat_0.2.1 Matrix_1.3-3
[25] fastmap_1.1.0 cli_2.5.0 later_1.2.0 htmltools_0.5.1.1
[29] prettyunits_1.1.1 tools_4.1.0 igraph_1.2.6 gtable_0.3.0
[33] glue_1.4.2 Rcpp_1.0.6 cellranger_1.1.0 DiceDesign_1.9
[37] vctrs_0.3.8 iterators_1.0.13 timeDate_3043.102 gower_0.2.2
[41] xfun_0.23 globals_0.14.0 rvest_1.0.0 mime_0.10
[45] lifecycle_1.0.0 kknn_1.3.1 future_1.21.0 MASS_7.3-54
[49] ipred_0.9-11 hms_1.1.0 promises_1.2.0.1 parallel_4.1.0
[53] RColorBrewer_1.1-2 yaml_2.2.1 curl_4.3.1 rpart_4.1-15
[57] reshape_0.8.8 stringi_1.6.2 foreach_1.5.1 lhs_1.1.1
[61] lava_1.6.9 repr_1.1.3 rlang_0.4.11 pkgconfig_2.0.3
[65] evaluate_0.14 lattice_0.20-44 htmlwidgets_1.5.3 labeling_0.4.2
[69] tidyselect_1.1.1 parallelly_1.26.0 plyr_1.8.6 magrittr_2.0.1
[73] R6_2.5.0 generics_0.1.0 DBI_1.1.1 pillar_1.6.1
[77] haven_2.4.1 withr_2.4.2 survival_3.2-11 nnet_7.3-16
[81] modelr_0.1.8 crayon_1.4.1 utf8_1.2.1 rmarkdown_2.8
[85] progress_1.2.2 grid_4.1.0 readxl_1.3.1 reprex_2.0.0
[89] digest_0.6.27 xtable_1.8-4 httpuv_1.6.1 GPfit_1.0-8
[93] munsell_0.5.0
The blog post you are looking at is fairly old, and there was a change to tune a while back so that you should now put either a workflow or a model first. Hence the error message:
The first argument to [fit_resamples()] should be either a model or workflow.
The fix is to put your model or workflow as the first argument, like this:
knn_res <- fit_resamples(
knn_spec,
children ~ .,
validation_splits,
control = control_resamples(save_pred = TRUE)
)