Search code examples
progress-barstatusmlr3

mlr3 - benchmarking: status messages are only displayed after full benchmark is completed


I would like to monitor the progress of benchmark() in mlr3. Benchmarking several models including hyperparameter tuning on a large data set can take hours or even days. I would like to be able to monitor the progress while benchmark is running, so that I can decide whether or not to abort the benchmark. In addition, if status messages are printed during the process, I can abort the process after some parts are completed and know how long certain steps took. For example, Naive Bayes might have already completed but the hyperparameter tuning for decisions trees is still running (and has been for hours...). That way I could make appropriate changes for the next benchmark run (e.g., limit the search space for decision trees or go with only Naive Bayes).

The problem is that only the first messages ("running resampling instances") is displayed during/at the beginning of the process. The rest only shows up after the full benchmark is completed. In other words, for hours or even days the only status messages displayed is the first one. If one aborts the process, all information about the progress (duration of individual steps) is lost.

This is a very short example. The relevant part of the code is adopted straight from the mlr3 book:

    design = benchmark_grid(
               tasks = task,
               learners = c(lrn("classif.featureless"), 
               lrn("classif.xgboost")
               ),
               resamplings = resampling
              )
     bmr = benchmark(design)

The status message on the start of the benchmark process is displayed right away enter image description here

The other messages - inlcuding those on a status only microseconds after the start of the progress - are only displayed once the full benchmarking process is completed.

enter image description here

In this case, it only takes a minute. But if the benchmark would take hours or days, it would be helpful to observe the messages on a new status right away rather than having to wait until the full process is completed.

I am aware that benchmark() supports progressr::with_progress(), but a percentage level is not detailed enough for me.

Is there a way to get mlr3::benchmark() to display status messages right away throughout the entire process, not just after the process has finished?


Solution

  • The reason for this behaviour is the internal call to future.apply::future_mapply(). When using base::mapply(), the output is printed directly. The latter can be enforced by setting options("mlr3.debug" = TRUE) as shown below.

    I've opened a pull request to also force this behaviour in future.apply::future_mapply(), which is the default if options("mlr3.debug" = FALSE).

    Here's a temporary workaround

    library("mlr3")
    tasks = list(tsk("penguins"), tsk("sonar"))
    learners = list(lrn("classif.featureless"), lrn("classif.rpart"))
    resamplings = list(rsmp("cv"), rsmp("subsampling"))
    
    grid = benchmark_grid(tasks, learners, resamplings)
    print(grid)
    
    # workaround
    options("mlr3.debug" = TRUE)
    
    benchmark(grid)