I'm using foreach and reading up on it e.g.
My understanding is that you would use %dopar%
for parallel processing and %do%
for sequential.
As it happens I was having issues with %dopar%
and while trying to debug I changed it to a what I thought was a sequential loop using %do%
. I happened to have the terminal open and noticed all processors running while I ran the loop.
Is this expected?
Reproducible example:
library(tidyverse)
library(caret)
library(foreach)
# expected to see parallel here because caret and xgb with train()
xgbFit <- train(Species ~ ., data = iris, method = "xgbTree",
trControl = trainControl(method = "cv", classProbs = TRUE))
iris_big <- do.call(rbind, replicate(1000, iris, simplify = F))
nr <- nrow(iris_big)
n <- 1000 # loop over in chunks of 20
pieces <- split(iris_big, rep(1:ceiling(nr/n), each=n, length.out=nr))
lenp <- length(pieces)
# did not expect to see parallel processing take place when running the block below
predictions <- foreach(i = seq_len(lenp)) %do% {
# get prediction
preds <- pieces[[i]] %>%
mutate(xgb_prediction = predict(xgbFit, newdata = .))
return(preds)
}
bah <- do.call(rbind, predictions)
My best guess would be that these are processes still running from previous runs.
It is the same when using foreach::registerDoSeq()
?
My second guess would be that predict
runs in parallel.