Search code examples
rforeachdoparallel

Writing to database in parallel in R


I try to write a table which is a processed subset of a global data variable, in a normal for loop this piece of code works fine but when I try to do it in parallel it raises an error.

Here is my piece of code;

library(doParallel)
library(foreach)
library(odbc)
library(data.table)

nc <- detectCores() - 1

cs <- makeCluster(nc)

registerDoParallel(cs)

con <- dbConnect(odbc(),driver = 'SQL Server',server = 'localserver',database = 'mydb', encoding = 'utf-8',timeout = 20)

range_to <- 1e6

set.seed(1)

random_df <- data.table(a = rnorm(n = range_to,mean = 2,sd = 1),
                        b = runif(n = range_to,min = 1,max = 300))


foreach(i=1:1000,.packages = c('odbc','data.table')) %dopar% {
  subk <- random_df[i,]
   
  subk <- subk**2

  odbc::dbWriteTable(conn = con,name = 'parallel_test',value = subk,row.names = FALSE,append = TRUE)
    
}

This code raises this error;

Error in {: task 1 failed - "unable to find an inherited method for function 'dbWriteTable' for signature '"Microsoft SQL Server", "character", "data.table"'"

Like I said before in a normal for loop it works fine.

Thanks in advance.


Solution

  • I solved that issue by changing only creating connection object method by;

    parallel::clusterEvalQ(cs, {library(odbc);con <- dbConnect(odbc(),driver = 'SQL Server',server = 'localserver',database = 'mydb', encoding = 'utf-8',timeout = 20)})