Search code examples
rvariable-assignmentassignment-operator

R Global assignment operator in a function - whats a better alternative?


I have a function in a package (mainly for my own use currently, might share at some future point). I'm trying to replace a slow for loop with an lapply so that later I can parallelise it. So one option I found that is hugely faster even without parellelisation is to use the global assignment operator. However I'm anxious about this as this seems to be frowned upon, and I'm not used to thinking about environments and so worry about side effects:

Here is a simple reprex:



n <- 2
nx <- 40
v <- 5
d <- 3

array4d <- array(rep(0, n * nx * v * d) ,
                       dim = c(n, nx, v, d) )
array4d2 <- array4d

# Make some data to enter into the array - in real problem a function gens this data depending on input vars

set.seed(4)
dummy_output <- lapply(1:v, function(i) runif(n*nx*d))

microbenchmark::microbenchmark( {
    for(i in 1:v){
        array4d[ , , i, ] <- dummy_output[[i]]
    }
}, {
    lapply(1: v, function(i) {
        array4d2[ , , i, ] <<- dummy_output[[i]]
    })
})

Unit: microseconds
                                                                                     expr      min        lq
             {     for (i in 1:v) {         array4d[, , i, ] <- dummy_output[[i]]     } } 1183.504 1273.6205
 {     lapply(1:v, function(i) {         array4d2[, , i, ] <<- dummy_output[[i]]     }) }   13.257   16.1715
       mean    median       uq      max neval cld
 1488.26909 1411.4565 1515.762 3535.974   100   b
   33.56976   18.1445   21.150 1525.608   100  a 
> 
> identical(array4d, array4d2)
[1] TRUE

All of this would be happening inside a function called many times by its parent.

So this is (lots!) faster. But my questions are

  1. Is it safe to do this?
  2. Is there a similarly fast alternative that does not use <<-?

Solution

  • Make the varying dimension the last one. microbenchmark indicates that its performance is not statistically different than the one using a global variable. If it is important that the dimension be the third use aperm(x, c(1, 2, 4, 3)) afterwards.

    microbenchmark::microbenchmark( 
        a = for(i in 1:v) array4d[ , , i, ] <- dummy_output[[i]],
        b = lapply(1: v, function(i) array4d2[ , , i, ] <<- dummy_output[[i]]),
        c = array(unlist(dummy_output), dim(array4d3))
    )