Search code examples
subsetrcppcontiguous

Rcpp subsetting contiguous StringVector


Good afternoon,

I have been trying to use a similar method to subsetting x[200:300] in R while using Rcpp. (Note, this is not the problem I am trying to solve, but I need to subset many ranges within the functions I am trying to write in C++, and I found that this was the bottleneck of my performance)

However, although I have tried ussing the methods in rcpp, using iterators or other things, I just don't seem to find a solution that is minimally "fast." Most of the solutions I find are very slow.

And looking at the reference of Rcpp, I can't seem to find anything, not can I find it looking in StackExchange.

I know this code is pretty ugly right now... But I am just clueless

// [[Rcpp::export]]
StringVector range_test_( StringVector& x, int i, int j){
    StringVector vect(x.begin()+i, x.begin()+j);
    return vect;
}

And then, it is like 800 times slower. I have been trying to find the same x[i:j] function that R, which is very fast, within the rcpp base... but I can't find it.

 tests_range <- rbenchmark::benchmark(
    x[200:3000],
    range_test_(x, 200, 3000),
    order = NULL,
    replications = 80
)[,1:4]

Gives as result

                             test replications elapsed relative
1                     x[200:3000]           80   0.001        1
3       range_test_(x, 200, 3000)           80   0.822      822

If anybody knows how to access the subsetting function x[i:j] or something as fast within Rcpp I would really appreciate it. I just can't seem to find the tool I am missing.


Solution

  • The issue is that the iterator constructor makes a copy. See this page

    Copy the data between iterators first and last to the created vector

    However, you can try this instead

    #include <Rcpp.h>
    
    // [[Rcpp::export]]
    Rcpp::StringVector in_range(Rcpp::StringVector &x, int i, int j) {
      return x[Rcpp::Range(i - 1, j - 1)]; // zero indexed
    }
    

    The time taken is a lot closer

    > set.seed(20597458)
    > x <- replicate(1e3, paste0(sample(LETTERS, 5), collapse = ""))
    > head(x)
    [1] "NHVFQ" "XMEOF" "DABUT" "XKTAZ" "NQXZL" "NPJLM"
    > 
    > stopifnot(all.equal(in_range(x, 100, 200), x[100:200]))
    > 
    > library(microbenchmark)
    > microbenchmark(in_range(x, 100, 200), x[100:200], times = 1e4)
    Unit: nanoseconds
                      expr  min   lq     mean median   uq     max neval
     in_range(x, 100, 200) 1185 1580 3669.780   1581 1976 3263205 10000
                x[100:200]  790  790 1658.571   1185 1186 2331256 10000
    

    Note that there is a page here on susbetting. I could not find a relevant example there though.