Search code examples
loopsindexingrcpprcpparmadillo

RcppArmadillo: Negative indexing in for-loop


I'm new to Rcpp and am trying to perform computations based on negative indexing in a for()-loop using RcppArmadillo. I already found out that negative indexing in RcppArmadillo is not so straightforward, but that it can be done via the vector of elements that should be kept (as I found here). That seems a bit difficult to me when the element to remove is the looping index. I tried to implement the last approach in this answer, but didn't succeed. Is there an easy way to specify the vector of elements excluding the element with the index of the loop?

So, I'm trying to find an equivalent in RcppArmadillo for y[-i] in the following MWE:

# In R:
# Input 
n <- 10
y <- 1:20

# Computation
x <- rep(NA, n)
for(i in 1:n){
x[i] <- sum(y[-i])
}

My code Rcpp code so far:

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>   

// [[Rcpp::export]]
arma::vec rcpp_sum (arma::vec y, int n){

arma::vec x(n);
arma::vec ind(n);

for (i=0; i<n; i++){
ind[i] = /*no idea...*/
x[i] = sum(y[ind]);
}

return x;
}

Any help is greatly appreciated!


Solution

  • For task like this, it might be best to just skip the index in question. In standard C++ we would check for the index and skip it. Something like this:

    // [[Rcpp::export]]
    arma::vec rcpp_sum (arma::vec y, int n){
        
        arma::vec x(n);
        
        for (int i = 0; i < n; i++) {
            x[i] = 0; // Initialize value
            
            for (int j = 0; j < y.size(); ++j) {
                if (i != j) {
                    x[i] += y[j];
                }
            }
        }
        
        return x;
    }
    

    In the above, we are moving away from the sugar syntax. IMO that's okay in cases like this as the alternate isn't overly complicated. While we are simplifying, the dependency on RcppArmadillo isn't necessary as we can just use pure Rcpp

    #include <Rcpp.h>
    using namespace Rcpp;
    
    // [[Rcpp::export]]
    NumericVector pure_rcpp_sum (NumericVector y, int n){
        
        NumericVector x(n);
        
        for (int i = 0; i < n; i++) {
            for (int j = 0; j < y.size(); ++j) {
                if (i != j) {
                    x[i] += y[j];
                }
            }
        }
        
        return x;
    }
    

    Verifying the output:

    all.equal(as.vector(rcpp_sum(y, n)), x)
    [1] TRUE
    
    all.equal(pure_rcpp_sum(y, n), x)
    [1] TRUE
    

    Update

    Per the OP's request, we have an optimized approach in base R for this specific purpose. The above demonstrates how to attack the very specific problem of only summing values from a vector leaving out one value in C++. This was meant to be pedagogical in nature and not necessarily the best way possible for this specific task (as we will show below).

    Before we show the simple R code, I would like to point out that the OP's concerns of having a simple conditional statement in an inner loop in C++ should not be feared (as is the case in base R). From what I could tell, subsetting as was demonstrated in the links from the OP, is O(n) and has the additional overhead of an extra logical vector. Our proposed solution above should be more efficient as it essentially does the same thing without the extra object.

    Now, for updated code:

    baseR <- function(y, n) {
        mySum <- sum(y)
        vapply(1:n, function(x) mySum - y[x], FUN.VALUE = 1)    
    }
    
    ## Here is the OP code for reference
    OP <- function(y, n) {
        x <- rep(NA, n)
        for(i in 1:n) {x[i] <- sum(y[-i])}
        x
    }
    

    That's it. It's lightning fast as well:

    huge_y <- rnorm(1e6)
    huge_n <- 1e3
    
    system.time(t1 <- baseR(huge_y, huge_n))
     user  system elapsed 
     0.003   0.000   0.003
    
    system.time(t2 <- pure_rcpp_sum(huge_y, huge_n))
     user  system elapsed 
    2.776   0.003   2.779 
    
    system.time(t3 <- OP(huge_y, huge_n))
     user  system elapsed 
    9.555   1.248  10.805
    
    all.equal(t1, t2)
    [1] TRUE
    
    all.equal(t1, t3)
    [1] TRUE