I would like to use a foreach syntax for querying a locally stored database.
My code looks like that:
h <- odbcDriverConnect(connection)
cl<-makeCluster(no_cores)
registerDoParallel(cl)
foreach(i = 1:dim(Table)[1],
.combine = rbind,
.export = "h",
.packages = "RODBC") %dopar% {
cat(i,"\n")
#h <- odbcDriverConnect(connection)
sqlQuery(query)
}
odbcCloseAll()
stopCluster(cl)
When I use %do% instead of %dopar% it works properly, but when I try to make it a parallel I get this error:
Error in { : task 1 failed - "first argument is not an open RODBC channel" In addition: Warning message: In e$fun(obj, substitute(ex), parent.frame(), e$data) : already exporting variable(s): h
When I place h inside the foreach loop it works, however I understand that create connection on every single step and I would like to avoid that. Is there any way I could export that connection to foreach (the same way it is made with other variables)?
You can't export database connections to the workers because they can't be properly serialized and deserialized (since they contain things like socket connections). Instead, you need to create them on the workers, but for efficiency, you should create them once and then use them repeatedly from your foreach loops.
Since you're using doParallel
, you can initialize the workers with clusterEvalQ
:
clusterEvalQ(cl, {
library(RODBC)
connection <- "???" # how are you setting this?
h <- odbcDriverConnect(connection)
})
You can then use h
from your foreach loops as long as you prevent foreach from auto-exporting h
to the workers:
foreach(i = 1:dim(Table)[1],
.combine = rbind,
.noexport = "h", # make sure h is *not* exported!
.packages = "RODBC") %dopar% {
cat(i, "\n")
sqlQuery(h, query)
}
It's important to use .noexport="h"
, otherwise the good h
will be masked and you'll get an error from RODBC.