I am trying to get a better understanding of how the Rcpp proxy model works.
For this, consider the following task: sample exponential random variables and do something with the result. A naive Rcpp implementation could be
NumericMatrix rmexp1(int n, int d) {
NumericMatrix out(n, d);
NumericVector values;
for (int k=0; k<n; k++) {
values = Rcpp::rexp(d);
// do something with values
out(k, _) = values;
return out;
Are the following statements correct?
allocates space for a new R vector, then values
stores the reference to that and discards the reference it previously held.values
are hard-copied into out(k, _)
since left- and right-hand-side datatypes are different. Let's approach this experimentally. How much memory is allocated by R and how long does that take? First, let's use your function and run it with different arguments. I am wrapping this in bench::mark
, since this gives me both RAM and CPU measurements:
> bench::mark(rmexp1(100, 10),
+ rmexp1(100, 100),
+ rmexp1(100, 1000),
+ rmexp1(100, 10000),
+ check = FALSE)
#> # A tibble: 4 x 13
#> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
#> 1 rmexp1(100, 10) 46.93µs 52.61µs 16307. 10.35KB 8.24 7918 4
#> 2 rmexp1(100, 100) 381.41µs 538.42µs 1786. 3.9MB 4.14 863 2
#> 3 rmexp1(100, 1000) 4.83ms 5.08ms 187. 1.53MB 8.68 86 4
#> 4 rmexp1(100, 10000) 59.85ms 63.19ms 15.5 15.27MB 5.17 6 2
#> # … with 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
#> # time <list>, gc <list>
Unsurprisingly, a larger matrix takes longer and requires more memory. In addition, the allocated memory is about twice as large as the memory required for the output matrix. So yes, we are allocating more memory than is needed here.
Is that performance critical? It depends. After all, you are creating random variates with an exponential distribution, which takes a finite time. In addition, you are doing some unspecified computation in do something with values
, which might take even longer. Let's get rid of creating random variates by using alternative functions which only allocate memory with or without initializing it to zero:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix rmzero(int n, int d) {
NumericMatrix out(n, d);
NumericVector values;
for (int k=0; k<n; k++) {
values = Rcpp::NumericVector(d);
// do something with values
out(k, _) = values;
return out;
// [[Rcpp::export]]
NumericMatrix rmnoinit(int n, int d) {
NumericMatrix out(n, d);
NumericVector values;
for (int k=0; k<n; k++) {
values = Rcpp::NumericVector(Rcpp::no_init(d));
// do something with values
out(k, _) = values;
return out;
With bench::mark
we get:
> bench::mark(rmexp1(100, 1000),
+ rmzero(100, 1000),
+ rmnoinit(100, 1000),
+ check = FALSE)
#> # A tibble: 3 x 13
#> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
#> 1 rmexp1(100, 1000) 4.83ms 5.05ms 190. 1.53MB 8.72 87 4
#> 2 rmzero(100, 1000) 509.74µs 562.24µs 1510. 1.53MB 60.4 525 21
#> 3 rmnoinit(100, 1000) 404.24µs 469.43µs 1785. 1.53MB 53.8 664 20
#> # … with 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
#> # time <list>, gc <list>
So roughly only 1/10 of the execution time of your function is due to memory allocation and other overhead. The rest comes from the random variates.
If generating random variates is the actual bottleneck in your code, you might be interested in my dqrng package:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::depends(dqrng)]]
#include <dqrng.h>
// [[Rcpp::export]]
NumericMatrix rmdqexp1(int n, int d) {
NumericMatrix out(n, d);
NumericVector values;
for (int k=0; k<n; k++) {
values = dqrng::dqrexp(d);
// do something with values
out(k, _) = values;
return out;
With bench::mark
we get:
> bench::mark(rmexp1(100, 1000),
+ rmdqexp1(100, 1000),
+ check = FALSE)
#> # A tibble: 2 x 13
#> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
#> <bch:expr> <bch:> <bch:> <dbl> <bch:byt> <dbl> <int> <dbl>
#> 1 rmexp1(100, 1000) 3.69ms 5.03ms 201. 1.53MB 6.36 95 3
#> 2 rmdqexp1(100, 1000) 1.09ms 1.21ms 700. 1.65MB 22.6 310 10
#> # … with 5 more variables: total_time <bch:tm>, result <list>, memory <list>,
#> # time <list>, gc <list>
Quite a bit of time can be saved by using a faster random number generator.