Search code examples
rperformancenested-loops

Speed up taking dot product of all pairs of rows and adding intercept term


I am applying 100 different model simulations to 83 data points and examining the range of estimates from each model simulation for each data point.

Each calculation itself is the product of 342 variables multiplied by 342 coefficients, plus an intercept added. The code I wrote below calculates the values correctly, but is heinously slow. Is there any way to improve processing speed?

spec = read.csv(spectra)
coef = read.csv(coefficents)

shell     = matrix(data=NA,ncol=101,nrow=nrow(spec))
shell     = as.data.frame(shell)
heading = paste("Sim_",seq(1,100,1),sep="")
names(shell)[1] = "Filename"
names(shell)[2:101] = heading

shell[1] = spec[1]

for (i in 1:nrow(spec))
{
  for (j in 1:100)
  {
    shell[i,j+1] = sum(spec[i,2:341]*coef[j,3:342]) + coef[j,2]
  }
}

Solution

  • Really you are performing matrix multiplication of spec with the transpose of coef and then adding a constant to each column. You should get a speedup by using the built-in matrix multiplication function %*% and vectorized operations to scale the columns:

    out <- cbind(spec[,1], t(t(spec[,2:341] %*% t(coef[1:100,3:342])) + coef[1:100,2]))
    

    The output of this 1-liner is identical to the output of the code in the original post (modified slightly to take and output matrices and not to set any names):

    OP <- function(spec, coef) {
      shell = matrix(data=NA,ncol=101,nrow=nrow(spec))
      shell[,1] <- spec[,1]
      for (i in 1:nrow(spec)) {
        for (j in 1:100) {
          shell[i,j+1] = sum(spec[i,2:341]*coef[j,3:342]) + coef[j,2]
        }
      }
      shell
    }
    all.equal(out, OP(spec, coef))
    # [1] TRUE
    

    In terms of runtimes, the vectorized operations yield significant benefit (38x), even for this relatively small example (1000 rows in spec):

    system.time(cbind(spec[,1], t(t(spec[,2:341] %*% t(coef[1:100,3:342])) + coef[1:100,2])))
    #    user  system elapsed 
    #   0.028   0.001   0.030 
    system.time(OP(spec, coef))
    #    user  system elapsed 
    #   0.927   0.224   1.161 
    

    Data:

    set.seed(144)
    spec <- matrix(rnorm(1000*341), nrow=1000)
    coef <- matrix(rnorm(100*342), nrow=100)