Search code examples
rperformancematlabvectorizationreshape

Fastest R equivalent to MATLAB's reshape() method?


I am converting a MATLAB script into R and regretting it so far, as it is slower at the moment. I'm trying to use "vectorized functions" as much as possible, but I'm relatively new to R and do not know what is meant by this. From my research for loops are only slower than the apply() method in R if you use loads of operators (including the parenthesis). Otherwise, I don't see what R could have done to slow down it further. Here is code that works that I want to speed up.

somPEs   <- 9;
inputPEs <- 6;
initial_w <- matrix(1, nrow=somPEs, ncol=inputPEs) 
w <- apply(initial_w, 1, function(i) runif(i));
# Reshape w to a 3D matrix of dimension: c(sqrt(somPEs), sqrt(somPEs), inputPEs)
nw <- array(0, dim=c(sqrt(somPEs), sqrt(somPEs), inputPEs))
for (i in 1:inputPEs) {
  nw[,,i] <- matrix(w[i,], nrow=sqrt(somPEs), ncol=sqrt(somPEs), byrow=TRUE)
}
w <- nw;

In MATLAB, this code is executed by a built-in function called "reshape", as is done as below:

w = reshape(w,[sqrt(somPEs) sqrt(somPEs) inputPEs]);

I timed my current R code and it's actually super fast, but I'd still like to learn about vectorization and how to convert my code to apply() for readability's sake.

user  system elapsed 
0.003   0.000   0.002 

Solution

  • The first step is to convert your array w from 6x9 to 3x3x6 size, which in your case can be done by transposing and then changing the dimension:

    neww <- t(w)
    dim(neww) <- c(sqrt(somPEs), sqrt(somPEs), inputPEs)
    

    This is almost what we want, except that the first two dimensions are flipped. You can use the aperm function to transpose them:

    neww <- aperm(neww, c(2, 1, 3))
    

    This should be a good deal quicker than looping through the matrix and individually copying over data by row. To see this, let's look at a larger example with 10,000 rows and 100 columns (which will be mapped to a 10x10x10k matrix):

    josilber <- function(w) {
      neww <- t(w)
      dim(neww) <- c(sqrt(dim(w)[2]), sqrt(dim(w)[2]), dim(w)[1])
      aperm(neww, c(2, 1, 3))
    }
    OP <- function(w) {
      nw <- array(0, dim=c(sqrt(dim(w)[2]), sqrt(dim(w)[2]), dim(w)[1]))
      for (i in 1:(dim(w)[1])) {
        nw[,,i] <- matrix(w[i,], nrow=sqrt(dim(w)[2]), ncol=sqrt(dim(w)[2]), byrow=TRUE)
      }
      nw
    }
    bigw <- matrix(runif(1000000), nrow=10000, ncol=100)
    all.equal(josilber(bigw), OP(bigw))
    # [1] TRUE
    microbenchmark(josilber(bigw), OP(bigw))
    # Unit: milliseconds
    #            expr       min       lq      mean     median        uq       max neval
    #  josilber(bigw)  8.483245  9.08430  14.46876   9.431534  11.76744  135.7204   100
    #        OP(bigw) 83.379053 97.07395 133.86606 117.223236 129.28317 1553.4381   100
    

    The approach using t, dim, and aperm is more than 10x faster in median runtime than the looping approach.