mlr: nested resampling for feature selection

I am running a benchmark experiment in which I tune a filtered learner, similar to the example given in the mlr tutorial under nested resampling and titled "Example 3: One task, two learners, feature filtering with tuning". My code is as follows:

library(survival)
library(mlr)

data(veteran)
set.seed(24601)
configureMlr(show.learner.output=TRUE, show.info=TRUE)

task_id = "MAS"
mas.task <- makeSurvTask(id = task_id, data = veteran, target = c("time", "status"))
mas.task <- createDummyFeatures(mas.task)

inner = makeResampleDesc("CV", iters=2, stratify=TRUE)  # Tuning
outer = makeResampleDesc("CV", iters=3, stratify=TRUE)  # Benchmarking

cox.lrn <- makeLearner(cl="surv.coxph", id = "coxph", predict.type="response")
cox.filt.uni.thresh.lrn = makeTuneWrapper(
  makeFilterWrapper(
    makeLearner(cl="surv.coxph", id = "cox.filt.uni.thresh", predict.type="response"), 
    fw.method="univariate.model.score", 
    perf.learner=cox.lrn
  ), 
  resampling = inner, 
  par.set = makeParamSet(makeDiscreteParam("fw.threshold", values=c(0.5, 0.6, 0.7))), 
  control = makeTuneControlGrid(),
  show.info = TRUE)

learners = list( cox.filt.uni.thresh.lrn )  
bmr = benchmark(learners=learners, tasks=mas.task, resamplings=outer, measures=list(cindex), show.info = TRUE)

Using this method it seems that each iteration of the outer resampling loop will use a possibly different value for fw.threshold - it will use the value that was determined best in the inner loop. My question is, is this acceptable or would it be better to tune that parameter first, using tuneParams and cross validation, and then run the benchmark using the previously tuned parameters like this:

library(survival)
library(mlr)

data(veteran)
set.seed(24601)
configureMlr(show.learner.output=TRUE, show.info=TRUE)

task_id = "MAS"
mas.task <- makeSurvTask(id = task_id, data = veteran, target = c("time", "status"))
mas.task <- createDummyFeatures(mas.task)

inner = makeResampleDesc("CV", iters=2, stratify=TRUE)  # Tuning
outer = makeResampleDesc("CV", iters=3, stratify=TRUE)  # Benchmarking

cox.lrn <- makeLearner(cl="surv.coxph", id = "coxph", predict.type="response")
cox.filt.uni.thresh.lrn = 
  makeFilterWrapper(
    makeLearner(cl="surv.coxph", id = "cox.filt.uni.thresh", predict.type="response"), 
    fw.method="univariate.model.score", 
    perf.learner=cox.lrn
  )
params = makeParamSet(makeDiscreteParam("fw.threshold", values=c(0.5, 0.6, 0.7)))
ctrl = makeTuneControlGrid()

tuned.params = tuneParams(cox.filt.uni.thresh.lrn, mas.task, resampling = inner, par.set=params, control=ctrl, show.info = TRUE)
tuned.lrn = setHyperPars(cox.filt.uni.thresh.lrn, par.vals = tuned.params$x)

learners = list( tuned.lrn )  
bmr = benchmark(learners=learners, tasks=mas.task, resamplings=outer, measures=list(cindex), show.info = TRUE)

In this case the second method gives a slightly inferior result, but I'd like to know which is the correct method.

Solution

Using this method it seems that each iteration of the outer resampling loop will use a possibly different value for fw.threshold - it will use the value that was determined best in the inner loop. My question is, is this acceptable or would it be better to tune that parameter first, using tuneParams and cross validation, and then run the benchmark using the previously tuned parameters like this:

No, because in this case your model has already seen the data. This introduces bias. You should do all optimizations within the CV, for each fold separately. So method #1 is correct.

In addition, instead of using Grid Search I would suggest to use Random Search or MBO.