Search code examples
machine-learningclassificationthresholdmlr

Tuning the classification threshold in mlr


I am training a Naive Bayes model using the mlr package.

I would like to tune the threshold (and only the threshold) for the classification. The tutorial provides an example for doing this while also doing additional hyperparameter tuning in a nested CV-setting. I actually do not want to tune any other (hyper)parameter while finding the optimal threshold value.

Based on the discussion here I set up a makeTuneWrapper() object and set another parameter (laplace) to a fixed value (1) and subsequently run resample() in a nested CV-setting.

nbayes.lrn <- makeLearner("classif.naiveBayes", predict.type = "prob")
nbayes.lrn

nbayes.pst <- makeParamSet(makeDiscreteParam("laplace", value = 1))
nbayes.tcg <- makeTuneControlGrid(tune.threshold = TRUE)
# Inner 
rsmp.cv5.desc<-makeResampleDesc("CV", iters=5, stratify=TRUE)
nbayes.lrn<- makeTuneWrapper(nbayes.lrn, par.set=nbayes.pst, control=nbayes.tcg, resampling=rsmp.cv5.desc, measures=tpr) 
# Outer 
rsmp.cv10.desc<-makeResampleDesc("CV", iters=10, stratify=TRUE)
nbayes.res<-resample(nbayes.lrn, beispiel3.tsk, resampling= rsmp.cv10.desc, measures=list(tpr,ppv), extract=getTuneResult)

print(nbayes.res$extract)

Setting up a resampling scheme for the inner loop in the nested CV seems superfluous. The internal call to tuneThreshold() apparently does a more thorough optimization anyhow. However, calling makeTuneWrapper() without a resampling scheme leads to an error message.

I have two specific questions:

1.) Is there a simpler way of tuning a threshold (and only the threshold)?

2.) Given the setup used above: how can I access the threshold values that were actually tested?

EDIT:

This would be a code example for tuning the threshold for different measures (accuraccy, sensitivity, precision) based on the answer by @Lars Kotthoff.

### Create fake data
y<-c(rep(0,500), rep(1,500))
x<-c(rep(0, 300), rep(1,200), rep(0,100), rep(1,400))
balanced.df<-data.frame(y=y, x=x)
balanced.df$y<-as.factor(balanced.df$y)
balanced.df$x<-as.factor(balanced.df$x)
balanced.tsk<-makeClassifTask(data=balanced.df, target="y",   positive="1")
summarizeColumns(balanced.tsk)

### TuneThreshold
logreg.lrn<-makeLearner("classif.logreg", predict.type="prob")
logreg.mod<-train(logreg.lrn, balanced.tsk)
logreg.preds<-predict(logreg.mod, balanced.tsk)
threshold_tpr<-tuneThreshold(logreg.preds, measure=list(tpr))
threshold_tpr
threshold_acc<-tuneThreshold(logreg.preds, measure=list(acc))
threshold_acc
threshold_ppv<-tuneThreshold(logreg.preds, measure=list(ppv))
threshold_ppv

Solution

  • You can use tuneThreshold() directly:

    require(mlr)
    
    iris.model = train(makeLearner("classif.naiveBayes", predict.type = "prob"), iris.task)
    iris.preds = predict(iris.model, iris.task)
    
    res = tuneThreshold(iris.preds)
    
    

    Unfortunately, you can't access the threshold values that were tested when using tuneThreshold(). You could however treat the threshold value as a "normal" hyperparameter and use any of the tuning methods in mlr. This would allow you to get the values and corresponding performance.