If one wishes to iterate over a function and add the results to a list in Rcpp, one can use the .push_back()
method, for example:
List appended_list() {
List emptylist = List::create();
for (int i = 0; i < 3; i++) {
emptylist.push_back(i);
}
return emptylist;
}
However, am I correct in saying that the list gets redefined each time and thus this method is very inefficient? Is there a way to instead create a list of a static size (ie. create a list of n elements, where each element is NULL
or NA
or some such)? For example, to do this in R one would write:
emptylist <- list()
length(emptylist) <- 3
for (i in 1:3) {
emptylist[[i]] <- i
}
Is there a way to do this in Rcpp please? Would it be more efficient?
A pre-allocated list will be much faster. If you are unsure how to create a list of pre-determined size, my suggestion is that you probably need to spend some time with introductory Rcpp materials. Some good resources are:
Rcpp
, in particular the Introduction and FAQ vignettes. In addition to accessing them from R, you can find them on the web hereHere is an example showing how much faster avoiding push_back()
can be.
In so.cpp
, we have
#include <Rcpp.h>
// [[Rcpp::export]]
void test1() {
Rcpp::List out;
for ( int i = 0; i < 1000; ++i ) {
out.push_back(i);
}
}
// [[Rcpp::export]]
void test2() {
Rcpp::List out(1000);
for ( int i = 0; i < 1000; ++i ) {
out[i] = i;
}
}
Then we benchmark the functions against each other:
Rcpp::sourceCpp("so.cpp")
library(microbenchmark)
microbenchmark(empty_list = test1(), pre_allocated = test2())
Unit: microseconds
expr min lq mean median uq max
empty_list 3553.549 3755.405 4337.71591 3894.3075 4106.7500 8790.787
pre_allocated 22.089 23.689 38.67364 24.6645 26.1165 1339.443
neval
100
100
So you can see there is a considerable difference there. Of course, in this simplified case, it's a "considerable" difference that wouldn't be noticeable to humans, but in a more complex use case, or something that calls such a function many times, it could be actually "considerable."