Good afternoon,
I have been trying to use a similar method to subsetting x[200:300]
in R while using Rcpp. (Note, this is not the problem I am trying to solve, but I need to subset many ranges within the functions I am trying to write in C++, and I found that this was the bottleneck of my performance)
However, although I have tried ussing the methods in rcpp, using iterators or other things, I just don't seem to find a solution that is minimally "fast." Most of the solutions I find are very slow.
And looking at the reference of Rcpp, I can't seem to find anything, not can I find it looking in StackExchange.
I know this code is pretty ugly right now... But I am just clueless
// [[Rcpp::export]]
StringVector range_test_( StringVector& x, int i, int j){
StringVector vect(x.begin()+i, x.begin()+j);
return vect;
}
And then, it is like 800 times slower. I have been trying to find the same x[i:j]
function that R, which is very fast, within the rcpp base... but I can't find it.
tests_range <- rbenchmark::benchmark(
x[200:3000],
range_test_(x, 200, 3000),
order = NULL,
replications = 80
)[,1:4]
Gives as result
test replications elapsed relative
1 x[200:3000] 80 0.001 1
3 range_test_(x, 200, 3000) 80 0.822 822
If anybody knows how to access the subsetting function x[i:j]
or something as fast within Rcpp I would really appreciate it. I just can't seem to find the tool I am missing.
The issue is that the iterator constructor makes a copy. See this page
Copy the data between iterators first and last to the created vector
However, you can try this instead
#include <Rcpp.h>
// [[Rcpp::export]]
Rcpp::StringVector in_range(Rcpp::StringVector &x, int i, int j) {
return x[Rcpp::Range(i - 1, j - 1)]; // zero indexed
}
The time taken is a lot closer
> set.seed(20597458)
> x <- replicate(1e3, paste0(sample(LETTERS, 5), collapse = ""))
> head(x)
[1] "NHVFQ" "XMEOF" "DABUT" "XKTAZ" "NQXZL" "NPJLM"
>
> stopifnot(all.equal(in_range(x, 100, 200), x[100:200]))
>
> library(microbenchmark)
> microbenchmark(in_range(x, 100, 200), x[100:200], times = 1e4)
Unit: nanoseconds
expr min lq mean median uq max neval
in_range(x, 100, 200) 1185 1580 3669.780 1581 1976 3263205 10000
x[100:200] 790 790 1658.571 1185 1186 2331256 10000
Note that there is a page here on susbetting. I could not find a relevant example there though.