Search code examples
foreachrcppr-package

Creating a simple Rcpp package with dependency with other Rcpp package


I am trying to improve my loop computation speed by using foreach, but there is a simple Rcpp function I defined inside of this loop. I saved the Rcpp function as mproduct.cpp, and I call out the function simply using

sourceCpp("mproduct.cpp")

and the Rcpp function is a simple one, which is to perform matrix product in C++:

// [[Rcpp::depends(RcppArmadillo, RcppEigen)]]

#include <RcppArmadillo.h>
#include <RcppEigen.h>

// [[Rcpp::export]]
SEXP MP(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
  Eigen::MatrixXd C = A * B;
  return Rcpp::wrap(C);
}

So, the function in the Rcpp file is MP, referring to matrix product. I need to perform the following foreach loop (I have simplified the code for illustration):

foreach(j=1:n, .package='Rcpp',.noexport= c("mproduct.cpp"),.combine=rbind)%dopar%{
n=1000000
A<-matrix(rnorm(n,1000,1000))
B<-matrix(rnorm(n,1000,1000))
S<-MP(A,B)
return(S)
} 

Since the size of matrix A and B are large, it is why I want to use foreach to alleviate the computational cost.

However, the above code does not work, since it provides me error message:

task 1 failed - "NULL value passed as symbol address"

The reason I added .noexport= c("mproduct.cpp") is to follow some suggestions from people who solved similar issues (Can't run Rcpp function in foreach - "NULL value passed as symbol address"). But somehow this does not solve my issue.

So I tried to install my Rcpp function as a library. I used the following code:

Rcpp.package.skeleton('mp',cpp_files = "<my working directory>")

but it returns me a warning message:

The following packages are referenced using Rcpp::depends attributes however are not listed in the Depends, Imports or LinkingTo fields of the package DESCRIPTION file: RcppArmadillo, RcppEigen 

so when I tried to install my package using

install.packages("<my working directory>",repos = NULL,type='source')

I got the warning message:

Error in untar2(tarfile, files, list, exdir, restore_times) : 
  incomplete block on file
In R CMD INSTALL
Warning in install.packages :
  installation of package ‘C:/Users/Lenovo/Documents/mproduct.cpp’ had non-zero exit status

So can someone help me out how to solve 1) using foreach with Rcpp function MP, or 2) install the Rcpp file as a package?

Thank you all very much.


Solution

  • The first step would be making sure that you are optimizing the right thing. For me, this would not be the case as this simple benchmark shows:

    set.seed(42)
    n <- 1000
    A<-matrix(rnorm(n*n), n, n)
    B<-matrix(rnorm(n*n), n, n)
    
    MP <- Rcpp::cppFunction("SEXP MP(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
      Eigen::MatrixXd C = A * B;
      return Rcpp::wrap(C);
    }", depends = "RcppEigen")
    
    bench::mark(MP(A, B), A %*% B)[1:5]
    #> # A tibble: 2 x 5
    #>   expression      min   median `itr/sec` mem_alloc
    #>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>
    #> 1 MP(A, B)    277.8ms    278ms      3.60    7.63MB
    #> 2 A %*% B      37.4ms     39ms     22.8     7.63MB
    

    So for me the matrix product via %*% is several times faster than the one via RcppEigen. However, I am using Linux with OpenBLAS for matrix operations while you are on Windows, which often means reference BLAS for matrix operations. It might be that RcppEigen is faster on your system. I am not sure how difficult it is for Windows user to get a faster BLAS implementation (https://csgillespie.github.io/efficientR/set-up.html#blas-and-alternative-r-interpreters might contain some pointers), but I would suggest spending some time on investigating this.

    Now if you come to the conclusion that you do need RcppEigen or RcppArmadillo in your code and want to put that code into a package, you can do the following. Instead of Rcpp::Rcpp.package.skeleton() use RcppEigen::RcppEigen.package.skeleton() or RcppArmadillo::RcppArmadillo.package.skeleton() to create a starting point for a package based on RcppEigen or RcppArmadillo, respectively.