Search code examples
rforeachparallel-processinglogistic-regressiondoparallel

foreach with doParallel doesn't work with more than 1 core


I'm facing a problem when I launch this function :

  blocs <- split(df, 1 + (1:nrow(df)) %% ncores)
  cl <- makeCluster(ncores)
  registerDoParallel(cl)
  if (mode == "batch"){
    res <- foreach(i = blocs, .combine = "cbind", .export = c("batch_gradient_descent", "sampled_df", "add_constant", "sigmoid", "log_loss_function")) %dopar% {
      coefs <- batch_gradient_descent(df, colnames(X), colnames(y), learning_rate, max_iter)
    }
    return(res)
  }

When I run it with 1 core, it works. When I go with 2 or more cores, it doesn't enter in my foreach function, nothing happen and I have no error. I might miss something but after a lot of searching hours, impossible to find a solution !

Can someone give me a hint on this case ?


Solution

  • blocs <- split(df, 1 + (1:nrow(df)) %% ncores) will produce ncores many batches containing identical data (e.g. just 3 copies). Try to do sth. like this instead:

    library(tidyverse)
    library(doParallel)
    ncores <- 3
    df <- iris
    
    blocs <-
      df %>%
      mutate(batch = row_number() %% ncores) %>%
      nest(-batch) %>%
      pull(data)
    cl <- makeCluster(ncores)
    registerDoParallel(cl)
    
    res <- foreach(i = blocs, .combine = "rbind") %dopar% {
        Sys.sleep(5)
        coefs <- mean(i$Sepal.Length)
    }