I am trying to build a regression model using the Boston Housing data using the caret
package. The code is as follows
library(tidyverse)
library(ggplot2)
library(lubridate)
library(broom)
library(caret)
library(xgboost)
#list.files()
options(scipen = 999)
library(MASS)
data_model <- Boston
data_model <- as.data.frame(data_model)
# based on this link https://stackoverflow.com/questions/51762536/r-xgboost-on-caret-attempts-to-perform-classification-instead-of-regression
data_model$medv <- as.double(data_model$medv)
data_model$zn <- as.double(data_model$zn)
xgb_grid_1 = expand.grid(
nrounds = 1000,
max_depth = c(2, 4, 6, 8, 10),
eta=c(0.5, 0.1, 0.07),
gamma = 0.01,
colsample_bytree=0.5,
min_child_weight=1,
subsample=0.5
)
xgb_trcontrol_1 = trainControl(
method = "cv",
number = 5,
allowParallel = TRUE
)
xgb_train_1 = train(
x = data_model %>% dplyr::select(-medv) %>% as.matrix(),
y = as.matrix(data_model$medv),
trControl = xgb_trcontrol_1,
tuneGrid = xgb_grid_1,
method = "xgbTree",
metric = 'RMSE'
)
sessionInfo()
But when I run the train()
function I get the error Error: Metric RMSE not applicable for classification models
. Then I tried to change variables that were integers
to double
as suggested by this link. I still seem to get the same error. Am I missing out on an extra parameter that should take care of this? Thank You in advance! I have also included my session information below in case there is version conflict that I am not aware of.
R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] MASS_7.3-47 bindrcpp_0.2.2 xgboost_0.71.2 caret_6.0-81 lattice_0.20-35 broom_0.4.2 lubridate_1.6.0 dplyr_0.7.8 purrr_0.2.3
[10] readr_1.1.1 tidyr_0.7.2 tibble_1.4.2 ggplot2_2.2.1.9000 tidyverse_1.1.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 class_7.3-14 utf8_1.1.3 assertthat_0.2.0 ipred_0.9-6 psych_1.7.5 foreach_1.4.3 R6_2.2.2
[9] cellranger_1.1.0 plyr_1.8.4 stats4_3.4.0 httr_1.3.1 pillar_1.2.1 rlang_0.3.0.1 lazyeval_0.2.1 readxl_1.0.0
[17] rstudioapi_0.7 data.table_1.10.4 rpart_4.1-11 Matrix_1.2-9 splines_3.4.0 gower_0.1.2 stringr_1.3.0 foreign_0.8-67
[25] munsell_0.4.3 compiler_3.4.0 modelr_0.1.1 pkgconfig_2.0.1 mnormt_1.5-5 nnet_7.3-12 tidyselect_0.2.5 prodlim_2018.04.18
[33] codetools_0.2-15 crayon_1.3.4 withr_2.1.2 recipes_0.1.4 ModelMetrics_1.1.0 grid_3.4.0 nlme_3.1-131 jsonlite_1.5
[41] gtable_0.2.0 magrittr_1.5 waterfalls_0.1.2 scales_0.5.0.9000 cli_1.0.0 stringi_1.1.7 reshape2_1.4.3 timeDate_3012.100
[49] xml2_1.2.0 generics_0.0.1 lava_1.6.1 iterators_1.0.8 tools_3.4.0 forcats_0.2.0 glue_1.3.0 hms_0.3
[57] parallel_3.4.0 survival_2.41-3 colorspace_1.3-2 xgboostExplainer_0.1 rvest_0.3.2 bindr_0.1.1 haven_1.1.0
You have already converted data_model$zn
to double
. So, just remove as.matrix
in the parameter y = as.matrix(data_model$medv)