Look at the following simple code and operations:
library(Rcpp)
library(microbenchmark)
set.seed(100)
x <- sample(0:1, 1000000, replace = TRUE)
y <- sample(0:1, 1000000, replace = TRUE)
cppFunction('LogicalVector is_equal_c(NumericVector x, NumericVector y) {
return x == y;
}')
is_equal_R <- function(x, y) {
return(x==y)
}
mbm <- microbenchmark(c = is_equal_c(x,y),
R = is_equal_R(x,y)
)
mbm
it gives the following performance of the execution speed:
Unit: milliseconds
expr min lq mean median uq max neval cld
c 6.4132 6.6896 10.961774 11.2421 12.63245 102.5480 100 b
R 1.2555 1.2994 1.766561 1.3327 1.38220 9.0022 100 a
Simple R equality operator is 8 times faster than the Rcpp. Why is that so and is there a way how to make Rcpp code at least as fast as R simple vector equality operator?
As Florian already hinted, the error is yours by forcing a costly copy from int
to numeric
:
> class(1:3)
[1] "integer"
> class(1:3 + 0) # Florian's conversion
[1] "numeric"
>
Because integer values are actually 'lighter' than numeric (at 32 vs 64 bit), we may as well stick with integer and modify your C++ function signature accordingly.
On my computer, C++ then beats R, but both are vrey code as you would expect on an already-vectorised implementation in R.
Now as a C++ file with embedded R code
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::LogicalVector is_equal_c(Rcpp::IntegerVector x, Rcpp::IntegerVector y) {
return x == y;
}
/*** R
library(microbenchmark)
set.seed(100)
x <- sample(0:1, 1000000, replace = TRUE)
y <- sample(0:1, 1000000, replace = TRUE)
is_equal_R <- function(x, y) {
return(x==y)
}
mbm <- microbenchmark(c = is_equal_c(x,y),
R = is_equal_R(x,y))
mbm
*/
> Rcpp::sourceCpp("answer.cpp")
> library(microbenchmark)
> set.seed(100)
> x <- sample(0:1, 1000000, replace = TRUE)
> y <- sample(0:1, 1000000, replace = TRUE)
> is_equal_R <- function(x, y) {
+ > return(x==y)
+ >
}
> mbm <- microbenchmark(c = is_equal_c(x,y),
+ > R = is_equal_R(x,y))
> mbm
Unit: milliseconds
expr min lq mean median uq max neval cld
c 1.77923 1.82570 2.06075 1.87093 1.93911 4.31854 100 a
R 1.20529 2.03077 2.23089 2.06222 2.11870 10.89118 100 a
>
So in conclusion, your result was one of those headscratchers where one goes "this cannot possibly be true ..." and those are really good learning experiences :)