Search code examples
rrcpprcpparmadillorcppparallel

Not getting all() quicker using Rcpp


As I'm a bit new to Rcpp, I might be missing a trick here.

Let's create two matrices:

library(Rcpp)
library(microbenchmark)

P <- matrix(0, 200,500)
for(i in 1:500) P[,i] <- rep(rep(sample(0:1), 2), 25)
Parent_Check <- matrix(0, nrow(P), nrow(P))

I now want the following done:

Test1 <- function(){
  for (i in 1:nrow(P)) {
    Parent_Check[i,] <- apply(P, 1, function(x) all(x == P[i,]))
  }

}
Test1()

I then created a Rcpp version for all() hoping to improve speed, defined as:

Rcpp::cppFunction(

'bool all_C(LogicalVector x) {
  // Note the use of is_true to return a bool type.
  return is_true(all(x == TRUE));
 }
'  
)

Checking the speeds using all_C, it proves to be slower:

Test2 <- function(){
  for (i in 1:nrow(P)) {
    Parent_Check[i,] <- apply(P, 1, function(x) all_C(x == P[i,]))
  }
  Parent_Check
}

microbenchmark::microbenchmark(Test1(), Test2(), times = 10)
    expr      min       lq     mean   median       uq      max neval
 Test1() 467.9671 471.1590 488.1784 479.4830 485.4755 578.5338    10
 Test2() 544.6561 552.7025 587.8888 570.4416 641.1202 657.7581    10

Trouble is, all_C() is slower than all(), so I suspect the slow speed for Test2() requires a better all_C call as well as a way of avoid apply in the above example.

I tried rewriting apply in Rcpp using this answer, but using this Rcpp apply function makes it even slower.

Any ideas on how to improve the speed of Test1() using Rcpp?


Solution

  • As has been mentioned in the comments, trying to get a faster all() is not likely to help here. Rather, you would want to move the loops into C++. Doing that will also give you more control: for example, you can avoid always comparing all elements of the rows and instead short-circuit on the first element that isn’t equal.

    Here’s my stab at what a faster solution could look like:

    Rcpp::cppFunction('
    // For all rows, check if it is equal to all other rows
    LogicalMatrix f2(const NumericMatrix& x) {
      size_t n = x.rows();
      size_t p = x.cols();
      LogicalMatrix result(n, n);
    
      for (size_t i = 0; i < n; i++) {
        for (size_t j = 0; j < i; j++) {
          bool rows_equal = true;
    
          for (size_t k = 0; k < p; k++) {
            if (x(i, k) != x(j, k)) {
              rows_equal = false;
              break;
            }
          }
    
          result(i, j) = rows_equal;
          result(j, i) = rows_equal;
        }
        result(i, i) = true;
      }
    
      return result;
    }
    ')
    

    Original implementation:

    set.seed(4)
    
    P <- matrix(0, 200,500)
    for(i in 1:500) P[,i] <- rep(rep(sample(0:1), 2), 25)
    
    f1 <- function(P) {
      Parent_Check <- matrix(0, nrow(P), nrow(P))
      for (i in 1:nrow(P)) {
        Parent_Check[i,] <- apply(P, 1, function(x) all(x == P[i,]))
      }
      Parent_Check
    }
    

    And results:

    bench::mark(f1(P), f2(P) * 1)
    #> Warning: Some expressions had a GC in every iteration; so filtering is
    #> disabled.
    #> # A tibble: 2 x 6
    #>   expression      min   median `itr/sec` mem_alloc `gc/sec`
    #>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
    #> 1 f1(P)      736.18ms 736.18ms      1.36     697MB    27.2 
    #> 2 f2(P) * 1    6.37ms   6.95ms    134.       471KB     1.96