Search code examples
c++rloopsrcpprcpparmadillo

Avoid too many "declare same elements of Rcpp List" in nested loops


I would like to ask about how to save computational time about accessing List in nested loop. Here is an Rcpp example including two functions:

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
#include <Rcpp.h>

// [[Rcpp::export]]
void first(const Rcpp::List ETA
          ,const int N
          ,const int I
                      ) {   
    for(int i = 0; i < I; ++i) {
        arma::rowvec eta = ETA[i];
        for(int n = 0; n < N; ++n) {
            // use eta to do some calculation
        }
    }
}

// [[Rcpp::export]]
void second(const Rcpp::List ETA
           ,const int N
           ,const int I
                      ) {   
    for(int n = 0; n < N; ++n) {
        for(int i = 0; i < I; ++i) {
            arma::rowvec eta = ETA[i];
            // use eta to do some calculation
        }
    }
}

Compare the time in R:

Rcpp::sourceCpp("test.cpp") # save the above code as "test.cpp"
ETA = list()
N = 10^8
I = 10
for (i in 1:I) ETA[[i]] = rep(0,i) # just an example.
ptm <- proc.time(); first(ETA,N,I); print((proc.time() - ptm))
#    user  system elapsed 
#       0       0       0 
ptm <- proc.time(); second(ETA,N,I); print((proc.time() - ptm))
#    user  system elapsed 
#   16.69    0.00   16.69

Here the ETA is a list whose each element can either have dynamic length (vector) or dynamic dimension (matrix). In this code, the first way is much faster than the second way. But for practical needs, the second way can reduce computational time when there are other variables iterated over n.

Questions:

For either first way or second way, can we declare the eta outside (before) the loops so that we don't need to declare the same eta so many times?


Solution

  • You're doing an implicit conversion and a deep copy during every loop, so it's no surprise that the compiler can't optimize that away.

    What you can do is pre-calculate all the etas and store them in a vector. (I filled in some work within the loop, since empty loops can be optimized out completely).

    // [[Rcpp::export]]
    double third(const Rcpp::List ETA
                  ,const int N
                  ,const int I) {   
      double output=0;
      
      // pre-create eta rowvecs
      std::vector<arma::rowvec> eta_vec(Rf_xlength(ETA));
      for(int i = 0; i < I; ++i) {
        eta_vec[i] = arma::rowvec(Rcpp::NumericVector(ETA[i]));
      }
      
      for(int n = 0; n < N; ++n) {
        for(int i = 0; i < I; ++i) {
          output += sum(eta_vec[i]);
        }
      }
      return output;
    }
    

    Results:

    > ptm <- proc.time(); first(ETA,N,I); print((proc.time() - ptm))
    [1] 0
       user  system elapsed 
      2.761   0.000   2.761 
    > #    user  system elapsed 
    > #       0       0       0 
    > ptm <- proc.time(); second(ETA,N,I); print((proc.time() - ptm))
    [1] 0
       user  system elapsed 
     29.935   0.000  29.930 
    > 
    > ptm <- proc.time(); third(ETA,N,I); print((proc.time() - ptm))
    [1] 0
       user  system elapsed 
      2.599   0.000   2.598