r performance loops nested-loops multicore

How could I speed up my R code?

Disclaimer

Hello everyone! I recently started programming in R. My codes are working just fine, but in terms of speed some of them are taking way too long to put to good use. I hope someone can help me making this code run faster either by optimising the code, or with the use of one of the multicore packages.

About the code

I have large datasets containing about 15000 numeric data each. The code takes two parameters (p, n) where p >= n, and make subsets of the data. It applies the zyp.yuepilon function (from the zyp package) to each row of the subsets. Then the parameter n is used to apply the same function on an n sized subset.

Problem is I run this code in a nested for loop: p in 10:40 and n in 10:40 so it takes an eternity to get the results, and it's just one dataset among many others.

sp <- function(p, n){
  library(zyp)

  data <- runif(15000, 1, 4)
  lower <- seq(80 - p + 1, by=1, length.out=length(data)-81)
  upper <- lower + p - 1

  subsets <- matrix(nrow=length(lower), ncol=p)
  for(j in 1:length(lower)){
    subsets[j, ] = data[lower[j] : upper[j]]
  }

  ret <- apply(subsets, 1, zyp.yuepilon)

  subset_n <- subsets[, 1:n]
  ret2 <- apply(subset_n, 1, zyp.yuepilon)
  return(list(ret, ret2))
}

Benchmark results in seconds:

expr      min       lq   median       uq      max neval
sp(7, 6) 92.77266 94.24901 94.53346 95.10363 95.64914    10

Solution

Here is a series of comments, rather than an answer.

Looking at the zyp.yuepilon function body, by calling the function without parenthesis in a R session, you see that this function, and the function zyp.sen are written in plain R code (as opposed to compiled code).

The biggest speed-up is likely attained by using the Rcpp package which facilitates calling (compiled) C++ code within R. In fact, there is a small linear model example here Fast LM model using Rcpp/RcppArmadillo.

I would be inclined to rewrite the two functions zyp.yuepilon and zyp.sen in C++, using Rcpp, including the loop over subset vectors (for which you are currently using apply to do).

For general R speed-up issues see this question R loop performance, as well as the R package plyr, which may provide an entry point for taking a map-reduce type of approach to your problem.

If you want to steer clear of C++, then a series of micro-optimisations would be your quickest win. To speed up the apply aspect of your code, you could use something like this

library(doParallel)
library(parallel)
library(foreach)
library(zyp)

cl<-makeCluster(4)
registerDoParallel(cl)

sp_1<-function(p=7, n=6){
N_ob=15000; 
off_set=81; 
N_ob_o=N_ob-off_set; 

am<-matrix(runif(N_ob*p),ncol=p); 
subsets<-am[-(1:off_set),];
ret=matrix(unlist( foreach(i=1:N_ob_o) %dopar% zyp::zyp.yuepilon(subsets[i,]),use.names=FALSE),ncol=11, byrow=TRUE);

subset_n <- subsets[, 1:n]
ret2=matrix(unlist( foreach(i=1:N_ob_o) %dopar% zyp::zyp.yuepilon(subset_n[i,]),use.names=FALSE),nrow=11);
return(list(ret, ret2))
}

sp<-function(p=7, n=6){

data <- runif(15000, 1, 4)
lower <- seq(80 - p + 1, by=1, length.out=length(data)-81)
upper <- lower + p - 1

subsets <- matrix(nrow=length(lower), ncol=p)
for(j in 1:length(lower)){
subsets[j, ] = data[lower[j] : upper[j]]
}

ret <- apply(subsets, 1, zyp.yuepilon)

subset_n <- subsets[, 1:n]
ret2 <- apply(subset_n, 1, zyp.yuepilon)
return(list(ret, ret2))
}

system.time(sp_1())
system.time(sp())

This gives me a speed-up of around a factor of 2. But this will depend on your platform, etc. Check out the help files for the functions and packages above, and tune the number of clusters using makeCluster to see what works best for your platform (in the absence of any information about your particular set-up).

Another route might be to make use of the byte-code compiler via library(compiler) to see if the various functions can be optimised, this way.

library(compiler)
enableJit(3);
zyp_comp<-cmpfun(zyp.yuepilon);