Search code examples
rperformancevectorization

R: how to vectorize code to remove for loop


I am writing a Monte Carlo simulation in R that I need to execute 100,000 times. I am having some efficiency problems. A key efficiency problem that I am having is that I have a for loop inside of the larger Monte Carlo for loop. I would like to try and remove this loop, if possible, but am currently stumped.

I have a dataframe which contains a value along with a start, and end which are indexes into the final matrix.

Here is a sample code snipet:

a <- data.frame( value = c( 3, 10, 5, 8),
                 start = c(2, 3, 4, 5), 
                 end = c( 9, 10, 9, 8 ))

b <- matrix( 0, nrow = nrow(a), ncol = 10)

# this is the for loop that I would like to remove
for ( i in 1:nrow(a) ) {
  b[ i, a$start[i]:a$end[i] ]<- a$value[i]
}

It feels as if I should be able to reframe the problem into a join of some type but I haven't been able to make progress. Any help is appreciated.


Solution

  • Vectorization with with rep.int, sequence, and matrix indexing:

    len <- a$end - a$start + 1
    b[matrix(c(rep.int(1:nrow(a), len), sequence(len, a$start)), ncol = 2)] <- rep.int(a$value, len)
    

    On a larger dataset, the vectorized version is > 13x faster:

    a <- data.frame(value = sample(10, 1e5, replace = TRUE),
                    start = sample(5, 1e5, replace = TRUE), 
                    end = sample(6:10, 1e5, replace = TRUE))
    b <- matrix(0, nrow = nrow(a), ncol = 10)
    
    vecfill <- function(a, b) {
      len <- a$end - a$start + 1
      b[matrix(c(rep.int(1:nrow(a), len), sequence(len, a$start)), ncol = 2)] <- rep.int(a$value, len)
      return(b)
    }
    
    iterfill <- function(a, b) {
      for ( i in 1:nrow(a) ) {
        b[ i, a$start[i]:a$end[i] ]<- a$value[i]
      }
      
      return(b)
    }
    
    microbenchmark::microbenchmark(vecfill(a, b), iterfill(a, b), times = 100)
    #> Unit: milliseconds
    #>            expr      min        lq      mean    median       uq      max neval
    #>   vecfill(a, b)  19.5291  19.99705  24.72165  21.01205  24.0373  75.8988   100
    #>  iterfill(a, b) 292.6082 310.52755 330.09472 319.50020 331.3736 560.9486   100