Search code examples
ralgorithmoptimizationrcpp

Can Rcpp be used to speed up calls to other R functions?


I am writing an R package for statistical analysis and machine learning (ML) that is often very slow. It is slow because it involves training and predicting models, both statistical and machine learning. My package is model-agnostic, by which I mean that it interfaces with any other model training and prediction package from R to retrain their models and use their models to make predictions. After extensive profiling and code refactoring (mainly by converting as much as possible to vectorized and matrix operations), I have found that the slow points that I cannot speed up further by refactoring come down to code that:

  • calls predict functions from other R packages. My main procedures might call the predict function literally thousands of times, so predict functions that take even 0.1 seconds can result in my functions taking many minutes or even a few hours to run.
  • calls to train models from other R packages. A few of my procedures retrain the input models from 100 to 1000 times. So, model training time of 1 second takes around 17 minutes to run. Slower training time than that becomes really unmanageable.

I would like to know if Rcpp can help speed things up in my situation. Please note what I am not asking here:

  • I am not asking about if I really need to run and predict models as many times as that. I am pursuing that important question separately; I am indeed trying to minimize those needs as much as possible. So, I am asking based on the assumption that I really do need to run models and predictions that often.
  • I certainly intend to implement parallel processing to help alleviate the problem, but this is only a limited solution. Even if some users have as many as 10 physical computer cores (which very few users would have), dividing the example speeds I gave above by 10 would still result in slow code. I'm trying to go further than that. Parallel processing would be an additional solution to whatever else I can do.

My key doubt about whether Rcpp can help is that the slowest code is when I call other packages' R functions. I've been reading up a lot on Rcpp and I'm even taking the DataCamp course on that topic. However, from my current exploration of Rcpp, although many sources explain why we would want to use Rcpp (to speed up slow R code), I have been unable to find any source that clearly spells out what kinds of problems that Rcpp cannot help with.

From what I've gathered, Rcpp cannot provide any speed-up when it calls R functions. The functions that are slowing down my code are those written by other packages. For example, I have an article that demonstrates my package functionality using nnet::nnet() and nnet::predict.nnet() to train and predict a neural network, respectively, and gbm::gbm() and gbm::predict.gbm() to train and predict a gradient boosted machine, respectively. Is there any way to use Rcpp to optimize the calls to these functions?

If I could call Rcpp::cppFunction() in real-time to receive these functions, compile them to C++, and then continue to execute them with my program, then that could be a viable solution. But is that even possible with Rcpp? I would appreciate any guidance here. And I am willing to accept a clearly explained answer of, "No, Rcpp cannot help in your case, and here's why."


Solution

  • It appears some aspects are being confused here:

    • R code runs as R code at the speed of R code, even when called from Rcpp. There is no magic change Rcpp can effectuate that changes the R interpreter.
    • C++ code runs as the speed of compiled code, which is generally faster and sometimes a lot faster
    • One can mix and match: C++ code can call R code via for example Rcpp::Function. It is documented that this used to have more prohibitive overhead in very early days (and that is what the previous answer refers to) but got (much !!) better several years ago (see release NEWS or ChangeLog). There is still some overhead, but it is entirely viable and many packages to that for some tasks (including Rcpp itself).
    • One generally does not want to call cppFunction() repeatedly, or at each start of a package: when Rcpp code helps, it is trivial to wrap it in a package.
    • One can also mix and match, there are packages passing compiled functions to R functions.

    Most importantly, it generally pays off to follow a general rule: do not conjecture but rather profile and measure.