Search code examples
rforeachr-packagedoparallelpackage-development

How to Write R Package Documentation for a Function with Parallel Backend


I want to write this function as an R package

Edit

#' create suns package
#''
#' More detailed Description
#'
#' @describeIn This sums helps to
#'
#' @importFrom foreach foreach
#'
#' @importFrom doParallel registerDoParallel
#'
#' @param x Numeric Vector
#'
#' @importFrom doParallel `%dopar%`
#'
#' @importFrom parallel parallel
#'
#' @export
sums <- function(x){
plan(multisession)
n_cores <- detectCores()# check for howmany cores present in the Operating System
cl <- parallel::makeCluster(n_cores)# use all the cores pdectected
doParallel::registerDoParallel(cores  =  detectCores())

    ss <- function(x){
  `%dopar%` <- foreach::`%dopar%`
   foreach::foreach(i = x, .combine = "+") %dopar% {i}
     }
    sss <- function(x){
   `%dopar%` <- foreach::`%dopar%`
   foreach::foreach(i = x, .combine = "+") %dopar% {i^2}
}

ssq <- function(x){
   `%dopar%` <- foreach::`%dopar%`
   foreach::foreach(i = x, .combine = "+") %dopar% {i^3}
}

sums <- function(x, methods = c("sum", "squaredsum", "cubedsum")){

  output <- c()

  if("sum" %in% methods){
    output <- c(output, ss = ss(x))
  }

  if("squaredsum" %in% methods){
    output <- c(output, sss = sss(x))
  }

  if("cubedsum" %in% methods){
    output <- c(output, ssq = ssq(x))
  }

  return(output)
}

parallel::stopCluster(cl = cl)
x <- 1:10

sums(x)

.

What I Need

Assuming my vector x is such large that it will take a serial processing about 5 hours to complete the task like x <- 1:9e9 where parallel processing can help. How do I include:

n_cores <- detectCores()
#cl <- makeCluster(n_cores)
#registerDoParallel(cores  =  detectCores())

in my .R file and DESCRIPTION file such that it will be worthy of R package documentation?


Solution

  • Even if it is not very easy to see the scope of the question, I'll try to make relevent suggestions. I understand that you have problems running check on your package with examples/tests that use parallel computation.

    • First of all, remember that check uses CRAN standards and it is impossible in a CRAN package to run examples or tests that use more than 2 cores for compatibility reasons. So your examples must be simple enough to be dealt with by 2 cores.
    • Then there is a problem in your code as your create a cluster but don't use it in the doParallel
    • Next you are using in your piece of code parallel package and doParallel package, therefore they must be included in the DESCRIPTION file running in your console:
    usethis::use_package("parallel")
    usethis::use_package("doParallel")
    

    This will add both packages in the "Imports" section of your description. And then your won't load these libraries explicitely in your package.

    • Then you should also clarify your function in your example using "::" after the name of the relevant package which would make your example look like:
        n_cores <- 2
        cl <- parallel::makeCluster(n_cores)
        doParallel::registerDoParallel(cl = cl)
        ...
        parallel::stopCluster(cl = cl)
    

    You can also refer to the registerDoParallel documentation to get a similar piece of code, you will also find that it is limited to 2 cores.

    To be complete, I do not think your really need foreach package since default parallelization in R is very powerful. If you want to be able to use your function with detectCores, I would suggest you add a limitint parameter. This function should do what you want in a more "R like" manner:

    sums <- function(x, methods, maxcores) {
      n_cores <- min(maxcores,
                     parallel::detectCores())# check for howmany cores present in the Operating System
      cl <- parallel::makeCluster(n_cores)# use all the cores pdectected
      
      outputs <- sapply(
        X = methods,
        FUN = function(method) {
          if ("sum" == method) {
            output <- parallel::parSapply(
              cl = cl,
              X = x,
              FUN = function(i)
                i
            )
          }
          
          if ("squaredsum" == method) {
            output <-
              parallel::parSapply(
                cl = cl,
                X = x,
                FUN = function(i)
                  i ** 2
              )
          }
          
          if ("cubedsum" == method) {
            output <-
              parallel::parSapply(
                cl = cl,
                X = x,
                FUN = function(i)
                  i ** 3
              )
          }
          
          return(sum(output))
        }
      )
      
      parallel::stopCluster(cl = cl)
      
      return(outputs)
    }
    
    
    x <- 1:10000000
    
    sums(x = x, c("sum", "squaredsum"), 2)