Let's assume I will do a caret
training in R, but I want to split this training in two run sessions.
library(mlbench)
data(Sonar)
library(caret)
set.seed(998)
inTraining <- createDataPartition(Sonar$Class, p = .75, list = FALSE)
training <- Sonar[ inTraining,]
testing <- Sonar[-inTraining,]
# First run session
nn.partial <- train(Class ~ ., data = training,
method = "nnet",
max.turns.of.iteration=5) # Non-existent parameter. But represents my goal
Let´s assume that instead the nn
full object I have only a partial object that has training information until the turn 5 (i.e. nn.partial
). Thus, in future I could run the below code to finish the training job:
library(mlbench)
data(Sonar)
library(caret)
set.seed(998)
inTraining <- createDataPartition(Sonar$Class, p = .75, list = FALSE)
training <- Sonar[ inTraining,]
testing <- Sonar[-inTraining,]
nn <- train(Class ~ ., data = training,
method = "nnet",
previous.training=nn.partial) # Non-existent parameter. But represents my goal
I am aware that the both max.turns.of.iteration
and previous.training
do not exist in the train
function. I am just trying my best to represent in code what would be the ideal world to accomplish my goal if it was already implemented in train
function. However, as the parameters are not there, is there a way to achieve this goal (i.e. do the caret training in more than one run) by tricking the function in some way?
I have tried to play with the trainControl
function without success.
t.control <- trainControl(repeats=5)
nn <- train(Class ~ ., data = training,
method = "nnet",
trControl = t.control)
By doing that, the number of iteration turns is still much higher than 5, as I would like to obtain in my example.
I am almost certain that this is very complicated to implement in carets current infrastructure. However I will show you how to achieve this sort of thing out of the box with mlr3.
required packages for the example
library(mlr3)
library(mlr3tuning)
library(paradox)
get an example task and define a learner to be tuned:
task_sonar <- tsk('sonar')
learner <- lrn('classif.rpart', predict_type = 'prob')
define the hyper parameters to be tuned:
ps <- ParamSet$new(list(
ParamDbl$new("cp", lower = 0.001, upper = 0.1),
ParamInt$new("minsplit", lower = 1, upper = 10)
))
define the tuner and resampling strategy
tuner <- tnr("random_search")
cv3 <- rsmp("cv", folds = 3)
define the tuning instance
instance <- TuningInstance$new(
task = task_sonar,
learner = learner,
resampling = cv3,
measures = msr("classif.auc"),
param_set = ps,
terminator = term("evals", n_evals = 100) #one can combine multiple terminators such as clock time, number of evaluations, early stopping (stagnation), performance reached - ?Terminator
)
tune:
tuner$tune(instance)
now press stop after a second to stop the task in Rstudio
instance$archive()
nr batch_nr resample_result task_id learner_id resampling_id iters params tune_x warnings errors classif.auc
1: 1 1 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7105586
2: 2 2 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7372720
3: 3 3 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7335368
4: 4 4 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7335368
5: 5 5 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7276246
6: 6 6 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7111217
7: 7 7 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.6915560
8: 8 8 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7452875
9: 9 9 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7372720
10: 10 10 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7172328
in my case it finished 10 iterations of random search. You can now for instance call
save.image()
close RStudio and reopen the same project
or use saveRDS
/readRDS
on the objects you wish to keep
saveRDS(instance, "i.rds")
instance <- readRDS("i.rds")
after loading the required packages resume training with
tuner$tune(instance)
stop it again after few seconds:
in my case it finished an additional 12 iterations:
instance$archive()
nr batch_nr resample_result task_id learner_id resampling_id iters params tune_x warnings errors classif.auc
1: 1 1 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7105586
2: 2 2 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7372720
3: 3 3 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7335368
4: 4 4 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7335368
5: 5 5 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7276246
6: 6 6 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7111217
7: 7 7 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.6915560
8: 8 8 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7452875
9: 9 9 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7372720
10: 10 10 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7172328
11: 11 11 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7325289
12: 12 12 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7105586
13: 13 13 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7215133
14: 14 14 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.6915560
15: 15 15 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.6915560
16: 16 16 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7335368
17: 17 17 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7276246
18: 18 18 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7111217
19: 19 19 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7172328
20: 20 20 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7276246
21: 21 21 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7105586
22: 22 22 <ResampleResult> sonar classif.rpart cv 3 <list> <list> 0 0 0.7276246
Run it again without pressing stop
tuner$tune(instance)
and it will finish the 100 evals
Limitation: The above example splits the tuning (evaluation of hyper-parameters) to multiple sessions). However it does not split one training instance into multiple sessions - very few packages support this kind of thing in R - keras/tensorflow are the only one I know of.
However regardless of the length of one training instance for an algorithm, the tuning (evaluation of hyper parameters) of such an algorithm takes much more time so it is more advantageous to be able to pause/resume the tuning as in the above example.
If you find this interesting here are some resources to learn mlr3
https://mlr3book.mlr-org.com/
https://mlr3gallery.mlr-org.com/
Take a look also at mlr3pipelines - https://mlr3pipelines.mlr-org.com/articles/introduction.html