Search code examples
c++rrcpp

Rcpp select/subset NumericMatrix column by a NumericVector


I can select all the rows of a matrix and a range of columns of a matrix as follows:

library(Rcpp)
cppFunction('
NumericMatrix subset(NumericMatrix x){
  return x(_, Range(0, 1));
}
')

However, I would like to select columns based on a NumericVector y which, for instance, could be something like c(0, 1, 0, 0, 1). I tried this:

library(Rcpp)
cppFunction('
NumericMatrix subset(NumericMatrix x, NumericVector y){
  return x(_, y);
}
')

but it doesn't compile. How do I do it?


Solution

  • Alas, Rcpp doesn't have great support for non-contiguous views or selecting in a single statement only columns 1 and 4. As you saw, selecting contiguous views or selecting all columns is accessible with Rcpp::Range(). You'll likely want to upgrade to RcppArmadillo for better control over matrix subsets.

    RcppArmadillo subset examples

    #include <RcppArmadillo.h>
    // [[Rcpp::depends(RcppArmadillo)]]
    
    // [[Rcpp::export]]
    arma::mat matrix_subset_idx(const arma::mat& x,
                                const arma::uvec& y) { 
    
        // y must be an integer between 0 and columns - 1
        // Allows for repeated draws from same columns.
        return x.cols( y );
    }
    
    
    // [[Rcpp::export]]
    arma::mat matrix_subset_logical(const arma::mat& x,
                                    const arma::vec& y) { 
        // Assumes that y is 0/1 coded.
        // find() retrieves the integer index when y is equivalent 1. 
        return x.cols( arma::find(y == 1) );
    }
    

    Test

    # Sample data
    x = matrix(1:15, ncol = 5)
    x
    #      [,1] [,2] [,3] [,4] [,5]
    # [1,]    1    4    7   10   13
    # [2,]    2    5    8   11   14
    # [3,]    3    6    9   12   15
    
    # Subset only when 1 (TRUE) is found:
    matrix_subset_logical(x, c(0, 1, 0, 0, 1))
    #      [,1] [,2]
    # [1,]    4   13
    # [2,]    5   14
    # [3,]    6   15
    
    # Subset with an index representing the location
    # Note: C++ indices start at 0 not 1!
    matrix_subset_idx(x, c(1, 3))
    #      [,1] [,2]
    # [1,]    4   13
    # [2,]    5   14
    # [3,]    6   15
    

    Pure Rcpp logic

    If you do not want to take on the dependency of armadillo, then the equivalent for the matrix subset in Rcpp is:

    #include <Rcpp.h>
    
    // [[Rcpp::export]]
    Rcpp::NumericMatrix matrix_subset_idx_rcpp(
            Rcpp::NumericMatrix x, Rcpp::IntegerVector y) { 
    
        // Determine the number of observations
        int n_cols_out = y.size();
    
        // Create an output matrix
        Rcpp::NumericMatrix out = Rcpp::no_init(x.nrow(), n_cols_out);
    
        // Loop through each column and copy the data. 
        for(unsigned int z = 0; z < n_cols_out; ++z) {
            out(Rcpp::_, z) = x(Rcpp::_, y[z]);
        }
    
        return out;
    }