For deterministic forward or backward search, I'm used to give thresholds for p-values linked to coefficients linked to individual features. In the documention of makeFeatSelControlSequential in R/MLR https://www.rdocumentation.org/packages/mlr/versions/2.13/topics/FeatSelControl, alpha and beta parameters are described as follow:
It is however not clear what does "improvement difference" mean here. In the example below, I gave 0 as treshold for a backward selection (beta parameter). If this parameter relates to a threshold on p-value, I would expect to get the model without feature but it is not the case as I get an AUC of 0.9886302 instead of 0.5.
# 1. Find a synthetic dataset for supervised learning (two classes)
###################################################################
library(mlbench)
data(BreastCancer)
# generate 1000 rows, 21 quantitative candidate predictors and 1 target variable
p<-mlbench.waveform(1000)
# convert list into dataframe
dataset<-as.data.frame(p)
# drop thrid class to get 2 classes
dataset2 = subset(dataset, classes != 3)
dataset2 <- droplevels(dataset2 )
# 2. Perform cross validation with embedded feature selection using logistic regression
##########################################################################################
library(BBmisc)
library(mlr)
set.seed(123, "L'Ecuyer")
set.seed(21)
# Choice of data
mCT <- makeClassifTask(data =dataset2, target = "classes")
# Choice of algorithm
mL <- makeLearner("classif.logreg", predict.type = "prob")
# Choice of cross-validations for folds
outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)
# Choice of feature selection method
ctrl = makeFeatSelControlSequential(method = "sbs", maxit = NA,beta = 0)
# Choice of sampling between training and test within the fold
inner = makeResampleDesc("Holdout",stratify = TRUE)
lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)
The parameters control what difference in performance (for whatever performance measure you choose) is acceptable to proceed with a step along a forward or backward search. mlr doesn't compute any p-values, and no p-values are used in this process.
As the parameters only control what happens in a step, they also don't directly control the final outcome. What happens under the hood is that, e.g. for forward search, mlr computes the performances of all feature sets that expand the current one with a single feature and chooses the best one as long as it provides at least the improvement specified in alpha or beta. This procedure repeats until either all features (forward search) or no features (backward search) are present or if no minimum improvement as specified by the parameters can be achieved.