Search code examples
rresamplingaucmlr

auc in mlr benchmark experiment for classification problem gives error (requires predict type to be: 'prob')


I am conducting a benchmark analysis using the mlr package and would like to use auc as my performance measure. I have specified predict.type = "prob" and am still getting the following error message:

0001: Error in FUN(X[[i]], ...) : 
  Measure auc requires predict type to be: 'prob'!

My code:

#define measures
meas <- list(acc, mlr::auc, brier)

##random forest
p_length <- ncol(training_complete) - 1
lrn_RF = makeLearner("classif.randomForest", predict.type = "prob", par.vals = list("ntree" = 500L))
wcw_lrn_RF = makeWeightedClassesWrapper(lrn_RF, wcw.weight = 0.10) #weighted class wrapper
parsRF = makeParamSet(
  makeIntegerParam("mtry", lower = 1 , upper = floor(0.4*p_length)),
 makeIntegerParam("nodesize", lower = 10, upper = 50))
tuneRF = makeTuneControlRandom(maxit = 100)
inner = makeResampleDesc("CV", iters = 2)
learnerRF = makeTuneWrapper(lrn_RF, resampling = inner, meas, par.set = parsRF, control = tuneRF, show.info = FALSE)

##extreme gradient boosting
lrn_xgboost <- makeLearner(
  "classif.xgboost",
  predict.type = "prob", #before was response
  par.vals = list(objective = "binary:logistic", eval_metric = "error", nrounds = 200)) 
getParamSet("classif.xgboost")
pars_xgboost = makeParamSet(
  makeIntegerParam("nrounds", lower = 100, upper = 500),
  makeIntegerParam("max_depth", lower = 1, upper = 10),
  makeNumericParam("eta", lower = .1, upper = .5),
  makeNumericParam("lambda", lower = -1, upper = 0, trafo = function(x) 10^x))
tunexgboost = makeTuneControlRandom(maxit = 50) 
inner = makeResampleDesc("CV", iters = 2)
learnerxgboost = makeTuneWrapper(lrn_xgboost, resampling = inner, meas, par.set = pars_xgboost,control = tunexgboost, show.info = FALSE)


##Benchmarking via outer resampling loop

#Learners to be compared
lrns = list(
  makeLearner("classif.featureless"), 
  learnerRF,
  learnerxgboost
)

#outer resampling strategy
rdesc = makeResampleDesc("CV", iters = 5) 

library(methods)
library(parallel)
library(parallelMap)

set.seed(123, "L'Ecuyer") 

parallelStartSocket(parallel::detectCores(), level = "mlr.resample")

churn_benchmarking <- benchmark(learners = lrns,
                                tasks = trainTask,
                                resamplings = rdesc,
                                models = FALSE,
                                measures = meas)

parallelStop()

Any hint is highly appreciated!


Solution

  • I can see one problem. Your featureless learner is not providing probabilities.

    Write makeLearner("classif.featureless", predict.type = "prob") instead.