Search code examples
c++rrcpp

Would a statically sized Rcpp empty list be more efficient than list.push_back()?


If one wishes to iterate over a function and add the results to a list in Rcpp, one can use the .push_back() method, for example:

List appended_list() {
    List emptylist = List::create();
    for (int i = 0; i < 3; i++) {
      emptylist.push_back(i);
    }
    return emptylist;
}

However, am I correct in saying that the list gets redefined each time and thus this method is very inefficient? Is there a way to instead create a list of a static size (ie. create a list of n elements, where each element is NULL or NA or some such)? For example, to do this in R one would write:

emptylist <- list()
length(emptylist) <- 3
for (i in 1:3) {
  emptylist[[i]] <- i
}

Is there a way to do this in Rcpp please? Would it be more efficient?


Solution

  • A pre-allocated list will be much faster. If you are unsure how to create a list of pre-determined size, my suggestion is that you probably need to spend some time with introductory Rcpp materials. Some good resources are:

    Here is an example showing how much faster avoiding push_back() can be. In so.cpp, we have

    #include <Rcpp.h>
    
    // [[Rcpp::export]]
    void test1() {
        Rcpp::List out;
        for ( int i = 0; i < 1000; ++i ) {
            out.push_back(i);
        }
    }
    
    // [[Rcpp::export]]
    void test2() {
        Rcpp::List out(1000);
        for ( int i = 0; i < 1000; ++i ) {
            out[i] = i;
        }
    }
    

    Then we benchmark the functions against each other:

    Rcpp::sourceCpp("so.cpp")
    library(microbenchmark)
    microbenchmark(empty_list = test1(), pre_allocated = test2())
    
    Unit: microseconds
              expr      min       lq       mean    median        uq      max
        empty_list 3553.549 3755.405 4337.71591 3894.3075 4106.7500 8790.787
     pre_allocated   22.089   23.689   38.67364   24.6645   26.1165 1339.443
     neval
       100
       100
    

    So you can see there is a considerable difference there. Of course, in this simplified case, it's a "considerable" difference that wouldn't be noticeable to humans, but in a more complex use case, or something that calls such a function many times, it could be actually "considerable."