Search code examples
rrcpprcpparmadillo

Translate outer() from base R to RcppArmadillo


Is there any way to efficiently translate the outer() function for multiplication of two vectors from R base to RcppArmadillo? I attempted to do so but it is not efficient at all.

Take the following example:

library(Rcpp)
library(RcppArmadillo)
library(microbenchmark)

#Outer attempt
cppFunction(depends = "RcppArmadillo",
            ' 
  arma::mat outer_rcpp(arma::vec x, arma::vec y) {
    int x_length = x.n_elem;
    int y_length = y.n_elem;
    arma::mat final(x_length, y_length);
  
    // And use loops instead of outer
    for(int i = 0; i < x_length; i++) {
      final.col(i) = x[i] * y;
    }
  
    return(final);
  }
'
)

#Test for equal results
a <- rnorm(5)

base <- base::outer(a, a)
rcpp <- outer_rcpp(a, a)

all.equal(base, rcpp)

#Test for speed

b <- rnorm(5000)

microbenchmark(base = base::outer(b, b),
               rcpp = outer_rcpp(b, b), times = 10)


The results are 2 times slower using R base. I am sure that this can be done though matrix multiplication, any idea how?


Solution

  • As @thelatemail pointed out in the comments, the outer routine is already using a heavily optimized C routine.

    Armadillo itself has its own optimization for addressing matrix multiplication using the dgemm and dgemv routines from LAPACK:

    Playing around with the outerproduct calculations leads to a few optimizations. Mainly, we're opting to move the outer product into armadillo actions instead of loops:

    #include <RcppArmadillo.h>
    // [[Rcpp::depends(RcppArmadillo)]]
    
    // [[Rcpp::export]]
    arma::mat outer_rcpp(const arma::vec& x, const arma::vec& y) {
        int x_length = x.n_elem;
        int y_length = y.n_elem;
        arma::mat final(x_length, y_length);
      
        // And use loops instead of outer
        for(int i = 0; i < x_length; i++) {
          final.col(i) = x[i] * y;
        }
      
        return final;
      }
      
    
    // [[Rcpp::export]]
    arma::mat outer_with_armadillo(const arma::vec& x, const arma::vec& y) {
        arma::mat final = x*y.t();
        return final;
    }
    
    
    // [[Rcpp::export]]
    arma::mat outer_with_armadillo_transposed(const arma::vec& x, const arma::rowvec& y) {
        arma::mat final = x*y;
        return final;
    }
    

    Revisiting the benchmarking code, we have:

    b = rnorm(5000)
    b_tranposed = t(b)
    
    bench_results = microbenchmark::microbenchmark(base = base::outer(b, b),
                   outer_armadillo_loop = outer_rcpp(b, b),
                   outer_armadillo_optimized = outer_with_armadillo(b, b), 
                   outer_armadillo_optimized_transposed = outer_with_armadillo_transposed(b, b_tranposed), times = 10)
    bench_results
    
    expr min lq mean median uq max neval
    base 132.8601 141.3532 156.9979 146.7993 154.8954 234.2619 10
    outer_armadillo_loop 278.4115 279.9204 317.7907 288.4212 329.0769 451.6872 10
    outer_armadillo_optimized 272.4348 283.3380 347.7913 304.1181 339.3282 728.2264 10
    outer_armadillo_optimized_transposed 269.7855 270.7108 297.9580 279.8099 312.3488 386.4270 10

    From the results, the lowest I could achieve is having a pre-transposed b vector from column vector form into row-vector form: (n x 1) * (1 x m)