Search code examples
rparallel-processingdoparallelparallel-foreach

R: Parallelization with doParallel and foreach


I made the following sequential mini example in R:

all_list <- list()
all_list[1] <- list(1:6000)
all_list[2] <- list(100000:450000)
all_list[3] <- list(600000:1700000)
all_list[4] <- list(2000000:3300000)
all_list[5] <- list(3600000:5000000)

find <- list(c(12800, 12800, 12800, 25600, 51200, 102400, 204800, 409600, 819200, 1638400, 1638400, 2457600, 3276800, 4096000, 4915200, 4915200))
result <- list()
index <- 1
current_Intervall <- 1
current_number <- 1

while(current_number <= 5000000){

  for(i in 1:length(find[[1]])){
    if(current_number == find[[1]][i]){
      result[[index]] <- current_number
      index <- index + 1
      break
    }
  }

  current_number <- current_number + 1
  last <- lengths(all_list[current_Intervall])
  if(current_number > all_list[[current_Intervall]][last]){
    if(current_Intervall == length(all_list)){
      break
    }else{
      current_Intervall <- current_Intervall + 1
      current_number <- all_list[[current_Intervall]][1]
    }
  }
  print(current_number)
}

I want to make this code parallel for Windows. I thought of the doParallel package and foreach loops, because I did not find a package, which supported parallel while loops. Now I have tried this:

library(doParallel) 


all_list <- list()
all_list[1] <- list(1:6000)
all_list[2] <- list(100000:450000)
all_list[3] <- list(600000:1700000)
all_list[4] <- list(2000000:3300000)
all_list[5] <- list(3600000:5000000)

find <- list(c(12800, 12800, 12800, 25600, 51200, 102400, 204800, 409600, 819200, 1638400, 1638400, 2457600, 3276800, 4096000, 4915200, 4915200))
result <- list()
index <- 1
current_Intervall <- 1
current_number <- 1


no_cores <- detectCores() - 1  
cl <- makeCluster(no_cores)  
registerDoParallel(cl) 

print(current_number)

foreach(current_number=1:5000000) %dopar% {
  for(i in 1:length(find[[1]])){
    if(current_number == find[[1]][i]){
      result[[index]] <- current_number
      index <- index + 1
      break
    }
  }

  # current_number <- current_number + 1
  last <- lengths(all_list[current_Intervall])
  if(current_number > all_list[[current_Intervall]][last]){
    if(current_Intervall == length(all_list)){
      break
    }else{
      current_Intervall <- current_Intervall + 1
      current_number <- all_list[[current_Intervall]][1]
    }
  }
  print(current_number)
}

stopCluster(cl)

But the print output does not print anything and after about 2 minutes the loop does not terminate. But the sequential example holds after some seconds. I think there is something wrong.
Another questions is: Is it possible to redefine the counter number in foreach loops? In the above while loop I can set the counter "current_number" arbitary. But I think in R, for loops does not allow to redefine the counter number, right? Is there maybe a better package or alternative loop to parallelize the first example?

Best regards, Brayn


Solution

  • If you want to output something when using parallelism, use makeCluster(no_cores, outfile = "").