Why a self-written Rcpp vectorized mathematical function is faster than its base counterpart?

OK, I know the answer, but being inspired by this question, I'd like to get some nice opinions about the following: Why the Rcpp exercise below is ca. 15% faster (for long vectors) than the built-in exp()? We all know that Rcpp is a wrapper to the R/C API, so we should expect a slightly worse performance.

Rcpp::cppFunction("
   NumericVector exp2(NumericVector x) {
      NumericVector z = Rcpp::clone(x);
      int n = z.size();
      for (int i=0; i<n; ++i)
         z[i] = exp(z[i]);
      return z;
   }
")

library("microbenchmark")
x <- rcauchy(1000000)
microbenchmark(exp(x), exp2(x), unit="relative")
## Unit: relative
##     expr      min       lq   median       uq      max neval
##   exp(x) 1.159893 1.154143 1.155856 1.154482 0.926272   100
##  exp2(x) 1.000000 1.000000 1.000000 1.000000 1.000000   100

Solution

Base R tends to do more checking for NA so we can win a little by not doing that. Also note that by doing tricks like loop unrolling (as done in Rcpp Sugar) we can do little better still.

So I added

Rcpp::cppFunction("NumericVector expSugar(NumericVector x) { return exp(x); }")

and with that I get a further gain -- with less code on the user side:

R> microbenchmark(exp(x), exp2(x), expSugar(x), unit="relative")
Unit: relative
        expr     min      lq    mean  median      uq     max neval
      exp(x) 1.11190 1.11130 1.11718 1.10799 1.08938 1.02590   100
     exp2(x) 1.08184 1.08937 1.07289 1.07621 1.06382 1.00462   100
 expSugar(x) 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000   100
R>