r cross-validation feature-selection mlr

Meaning of alpha and beta parameters in function makeFeatSelControlSequential (MLR library in R)

For deterministic forward or backward search, I'm used to give thresholds for p-values linked to coefficients linked to individual features. In the documention of makeFeatSelControlSequential in R/MLR https://www.rdocumentation.org/packages/mlr/versions/2.13/topics/FeatSelControl, alpha and beta parameters are described as follow:

alpha (numeric(1)): Parameter of the sequential feature selection. Minimal required value of improvement difference for a forward / adding step. Default is 0.01.
beta (numeric(1)): Parameter of the sequential feature selection. Minimal required value of improvement difference for a backward / removing step. Negative values imply that you allow a slight decrease for the removal of a feature. Default is -0.001.

It is however not clear what does "improvement difference" mean here. In the example below, I gave 0 as treshold for a backward selection (beta parameter). If this parameter relates to a threshold on p-value, I would expect to get the model without feature but it is not the case as I get an AUC of 0.9886302 instead of 0.5.

# 1. Find a synthetic dataset for supervised learning (two classes)
###################################################################

library(mlbench)
data(BreastCancer)

# generate 1000 rows, 21 quantitative candidate predictors and 1 target variable 
p<-mlbench.waveform(1000) 

# convert list into dataframe
dataset<-as.data.frame(p)

# drop thrid class to get 2 classes
dataset2  = subset(dataset, classes != 3)
dataset2  <- droplevels(dataset2  ) 


# 2. Perform cross validation with embedded feature selection using logistic regression
##########################################################################################

library(BBmisc)
library(mlr)

set.seed(123, "L'Ecuyer")
set.seed(21)

# Choice of data 
mCT <- makeClassifTask(data =dataset2, target = "classes")

# Choice of algorithm 
mL <- makeLearner("classif.logreg", predict.type = "prob")

# Choice of cross-validations for folds 

outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)

# Choice of feature selection method

ctrl = makeFeatSelControlSequential(method = "sbs", maxit = NA,beta = 0)

# Choice of sampling between training and test within the fold

inner = makeResampleDesc("Holdout",stratify = TRUE)

lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)

Solution

The parameters control what difference in performance (for whatever performance measure you choose) is acceptable to proceed with a step along a forward or backward search. mlr doesn't compute any p-values, and no p-values are used in this process.

As the parameters only control what happens in a step, they also don't directly control the final outcome. What happens under the hood is that, e.g. for forward search, mlr computes the performances of all feature sets that expand the current one with a single feature and chooses the best one as long as it provides at least the improvement specified in alpha or beta. This procedure repeats until either all features (forward search) or no features (backward search) are present or if no minimum improvement as specified by the parameters can be achieved.