I want to combine recursive feature elimination with rfe()
and tuning together with model selection with trainControl()
using the method rf
(random forest). Instead of the standard summary statistic I would like to have the MAPE (mean absolute percentage error). Therefore I tried the following code using the ChickWeight
data set:
library(caret)
library(randomForest)
library(MLmetrics)
# Compute MAPE instead of other metrics
mape <- function(data, lev = NULL, model = NULL){
mape <- MAPE(y_pred = data$pred, y_true = data$obs)
c(MAPE = mape)
}
# specify trainControl
trc <- trainControl(method="repeatedcv", number=10, repeats=3, search="grid", savePred =T,
summaryFunction = mape)
# set up grid
tunegrid <- expand.grid(.mtry=c(1:3))
# specify rfeControl
rfec <- rfeControl(functions=rfFuncs, method="cv", number=10, saveDetails = TRUE)
set.seed(42)
results <- rfe(weight ~ Time + Chick + Diet,
sizes=c(1:3), # number of predictors from which should algorithm chose the best predictor
data = ChickWeight,
method="rf",
ntree = 250,
metric= "RMSE",
tuneGrid=tunegrid,
rfeControl=rfec,
trControl = trc)
The code runs without errors. But where do I find the MAPE, which I defined as a summaryFunction
in trainControl
? Is trainControl
executed or ignored?
How could I rewrite the code in order to do recursive feature elimination with rfe
and then tune the hyperparameter mtry
using trainControl
within rfe
and at the same time compute an additional error measure (MAPE)?
trainControl
is ignored, as its description
Control the computational nuances of the train function
would suggest. To use MAPE, you want
rfec$functions$summary <- mape
Then
rfe(weight ~ Time + Chick + Diet,
sizes = c(1:3),
data = ChickWeight,
method ="rf",
ntree = 250,
metric = "MAPE", # Modified
maximize = FALSE, # Modified
rfeControl = rfec)
#
# Recursive feature selection
#
# Outer resampling method: Cross-Validated (10 fold)
#
# Resampling performance over subset size:
#
# Variables MAPE MAPESD Selected
# 1 0.1903 0.03190
# 2 0.1029 0.01727 *
# 3 0.1326 0.02136
# 53 0.1303 0.02041
#
# The top 2 variables (out of 2):
# Time, Chick.L