Search code examples
rforeachregressionnlmedoparallel

How can I load custom functions into foreach loop in R?


I am trying to run gls models with a specific spatial correlation structure that comes from modifying the nlme package/ building new functions in the global environment from this post (the answer from this post that creates new functions which allows for the implementation of the correlation structure). Unfortunately I cannot get this spatial correlation structure to work when I run this through a foreach loop:

#setup example data
data("mtcars")
mtcars$lon = runif(nrow(mtcars)) #include lon and lat for the new correlation structure
mtcars$lat = runif(nrow(mtcars))
mtcars$marker = c(rep(1, nrow(mtcars)/2), rep(2, nrow(mtcars)/2)) #values for iterations

#set up cluster
detectCores()
cl <- parallel::makeCluster(6, setup_strategy = "sequential")
doParallel::registerDoParallel(cl)

#run model
list_models<-foreach(i=1:2, .packages=c('nlme'), .combine = cbind,
                     .export=ls(.GlobalEnv)) %dopar% {
                    
                       .GlobalEnv$i <- i
                       
                       model_trial<-gls(disp ~ wt, 
                                             correlation = corHaversine(form=~lon+lat, 
                                                                        mimic="corSpher"),
                                             data = mtcars)
                     }


stopCluster(cl)

When I run this I get the error message:

Error in { : 
  task 1 failed - "do not know how to calculate correlation matrix of “corHaversine” object"
In addition: Warning message:
In e$fun(obj, substitute(ex), parent.frame(), e$data) :
  already exporting variable(s): corHaversine, mtcars, path_df1

The model works fine with the added correlation structure :

correlation = corHaversine(form=~lon+lat,mimic="corSpher")

in a normal loop. Any help would be appreciated!


Solution

  • I'm not sure why your foreach approach doesn't work, andd I'm also not sure what you're actually calculating. Anyway, you may try this alternative approach using parallel::parLapply() which seems to work:

    First, I cleared workspace using rm(list=ls()), then I ran the entire first codeblock of this answer where they create "corStruct" class and corHaversine method to have it in workspace as well as the Data below, ready for clusterExport().

    library(parallel)
    cl <- makeCluster(detectCores() - 1)
    clusterEvalQ(cl, library(nlme))
    clusterExport(cl, ls())
    r <- parLapply(cl=cl, X=1:2, fun=function(i) {
      gls(disp ~ wt, 
          correlation=corHaversine(form= ~ lon + lat, mimic="corSpher"),
          data=mtcars)
    })
    stopCluster(cl)  ## stop cluster
    r  ## result
    # [[1]]
    # Generalized least squares fit by REML
    # Model: disp ~ wt 
    # Data: mtcars 
    # Log-restricted-likelihood: -166.6083
    # 
    # Coefficients:
    #   (Intercept)          wt 
    # -122.4464    110.9652 
    # 
    # Correlation Structure: corHaversine
    # Formula: ~lon + lat 
    # Parameter estimate(s):
    #   range 
    # 10.24478 
    # Degrees of freedom: 32 total; 30 residual
    # Residual standard error: 58.19052 
    # 
    # [[2]]
    # Generalized least squares fit by REML
    # Model: disp ~ wt 
    # Data: mtcars 
    # Log-restricted-likelihood: -166.6083
    # 
    # Coefficients:
    #   (Intercept)          wt 
    # -122.4464    110.9652 
    # 
    # Correlation Structure: corHaversine
    # Formula: ~lon + lat 
    # Parameter estimate(s):
    #   range 
    # 10.24478 
    # Degrees of freedom: 32 total; 30 residual
    # Residual standard error: 58.19052 
    

    Data:

    set.seed(42)  ## for sake of reproducibility
    mtcars <- within(mtcars, {
      lon <- runif(nrow(mtcars))
      lat <- runif(nrow(mtcars))
      marker <- c(rep(1, nrow(mtcars)/2), rep(2, nrow(mtcars)/2))
    })