Would a statically sized Rcpp empty list be more efficient than list.push_back()?

If one wishes to iterate over a function and add the results to a list in Rcpp, one can use the .push_back() method, for example:

List appended_list() {
    List emptylist = List::create();
    for (int i = 0; i < 3; i++) {
      emptylist.push_back(i);
    }
    return emptylist;
}

However, am I correct in saying that the list gets redefined each time and thus this method is very inefficient? Is there a way to instead create a list of a static size (ie. create a list of n elements, where each element is NULL or NA or some such)? For example, to do this in R one would write:

emptylist <- list()
length(emptylist) <- 3
for (i in 1:3) {
  emptylist[[i]] <- i
}

Is there a way to do this in Rcpp please? Would it be more efficient?

Solution

A pre-allocated list will be much faster. If you are unsure how to create a list of pre-determined size, my suggestion is that you probably need to spend some time with introductory Rcpp materials. Some good resources are:

The vignettes offered by Rcpp, in particular the Introduction and FAQ vignettes. In addition to accessing them from R, you can find them on the web here
Rcpp for everyone
Hadley's chapter on Rcpp

Here is an example showing how much faster avoiding push_back() can be. In so.cpp, we have

#include <Rcpp.h>

// [[Rcpp::export]]
void test1() {
    Rcpp::List out;
    for ( int i = 0; i < 1000; ++i ) {
        out.push_back(i);
    }
}

// [[Rcpp::export]]
void test2() {
    Rcpp::List out(1000);
    for ( int i = 0; i < 1000; ++i ) {
        out[i] = i;
    }
}

Then we benchmark the functions against each other:

Rcpp::sourceCpp("so.cpp")
library(microbenchmark)
microbenchmark(empty_list = test1(), pre_allocated = test2())

Unit: microseconds
          expr      min       lq       mean    median        uq      max
    empty_list 3553.549 3755.405 4337.71591 3894.3075 4106.7500 8790.787
 pre_allocated   22.089   23.689   38.67364   24.6645   26.1165 1339.443
 neval
   100
   100

So you can see there is a considerable difference there. Of course, in this simplified case, it's a "considerable" difference that wouldn't be noticeable to humans, but in a more complex use case, or something that calls such a function many times, it could be actually "considerable."