Search code examples
rrcpp

The equality logcial operator on Rcpp is 8 times slower than on base R


Look at the following simple code and operations:

    library(Rcpp)
    library(microbenchmark)

    set.seed(100)
    x <- sample(0:1, 1000000, replace = TRUE)
    y <- sample(0:1, 1000000, replace = TRUE)
    
    cppFunction('LogicalVector is_equal_c(NumericVector x, NumericVector y) {
      return x == y;
    }')
    is_equal_R <- function(x, y) {
      return(x==y)
    }
    mbm <- microbenchmark(c = is_equal_c(x,y),
                          R = is_equal_R(x,y)
    )
    mbm

it gives the following performance of the execution speed:

Unit: milliseconds
 expr    min     lq      mean  median       uq      max neval cld
    c 6.4132 6.6896 10.961774 11.2421 12.63245 102.5480   100   b
    R 1.2555 1.2994  1.766561  1.3327  1.38220   9.0022   100  a 

Simple R equality operator is 8 times faster than the Rcpp. Why is that so and is there a way how to make Rcpp code at least as fast as R simple vector equality operator?


Solution

  • As Florian already hinted, the error is yours by forcing a costly copy from int to numeric:

    > class(1:3)
    [1] "integer"
    > class(1:3 + 0)   # Florian's conversion
    [1] "numeric"
    > 
    

    Because integer values are actually 'lighter' than numeric (at 32 vs 64 bit), we may as well stick with integer and modify your C++ function signature accordingly.

    On my computer, C++ then beats R, but both are vrey code as you would expect on an already-vectorised implementation in R.

    Modified Code

    Now as a C++ file with embedded R code

    #include <Rcpp.h>
    
    // [[Rcpp::export]]
    Rcpp::LogicalVector is_equal_c(Rcpp::IntegerVector x, Rcpp::IntegerVector y) {
      return x == y;
    }
    
    /*** R
    library(microbenchmark)
    
    set.seed(100)
    x <- sample(0:1, 1000000, replace = TRUE)
    y <- sample(0:1, 1000000, replace = TRUE)
    
    is_equal_R <- function(x, y) {
      return(x==y)
    }
    mbm <- microbenchmark(c = is_equal_c(x,y),
                          R = is_equal_R(x,y))
    mbm
    */
    

    Output

    > Rcpp::sourceCpp("answer.cpp")
    
    > library(microbenchmark)
    
    > set.seed(100)
    
    > x <- sample(0:1, 1000000, replace = TRUE)
    
    > y <- sample(0:1, 1000000, replace = TRUE)
    
    > is_equal_R <- function(x, y) {
    + > return(x==y)
    + > 
    }
    
    > mbm <- microbenchmark(c = is_equal_c(x,y),
    + >                     R = is_equal_R(x,y))
    
    > mbm
    Unit: milliseconds
     expr     min      lq    mean  median      uq      max neval cld
        c 1.77923 1.82570 2.06075 1.87093 1.93911  4.31854   100   a
        R 1.20529 2.03077 2.23089 2.06222 2.11870 10.89118   100   a
    > 
    

    So in conclusion, your result was one of those headscratchers where one goes "this cannot possibly be true ..." and those are really good learning experiences :)