Search code examples
rrcpparmadillo

get an element from a subset of a matrix in Rarmadillo


I have a large-ish matrix. I'm trying to sample from it with dynamically changing weights. As it's forced to use loops in R, I'm trying to implement it in Rcpp so it has the chance of running a bit faster. After a bit of experimenting, I think I've figured out how to grab an index at random with the correct weights.

The trick is that I'm only sampling from a subset of columns at any given time (this can change to rows if it's more efficient in C - the matrix is actually symmetric). My indices are only defined for this subset of columns. In R, I'd do something along the lines of

large_matrix[, columns_of_interest][index]

and this works fine. How would I do the equivalent using Rcpp/Armadillo? My guess of

cppFunction("arma::vec element_from_subset(arma::mat D, arma::uvec i, arma::uvec columns) {
  # arma::mat D_subset = D.cols(columns);
  return D.cols(columns).elem(i);

  }", depends = "RcppArmadillo")

fails to compile (and .at instead of .elem doesn't work either, nor does the standard R trick of surrounding things in paranthesis.

This does work, but is what I'm trying to avoid:

cppFunction("arma::vec element_from_subset(arma::mat D, arma::uvec i, arma::uvec columns) {
  arma::mat D_subset = D.cols(columns);
  return D_subset.elem(i);

  }", depends = "RcppArmadillo")

Is there any way to accommplish this without needing to save D.cols(columns)?


Solution

  • Short answer: No.

    But, the problem is phrased incorrectly. Think about what is happening here:

    (M <- matrix(1:9, 3, 3)) 
    #>      [,1] [,2] [,3]
    #> [1,]    1    4    7
    #> [2,]    2    5    8
    #> [3,]    3    6    9
    
    columns_of_interest = 1:2
    M[, columns_of_interest] 
    #>      [,1] [,2]
    #> [1,]    1    4
    #> [2,]    2    5
    #> [3,]    3    6 
    

    From here, if we have the index being 1, then we get:

    index = 1
    M[, columns_of_interest][index]
    #> 1
    

    So, in essence, what's really happening is an entry-wise subset of (i,j). Thus, you should just use:

    Rcpp::cppFunction("double element_from_subset(arma::mat D, int i, int j) { 
                      return D(i, j);
                      }", depends = "RcppArmadillo")
    element_from_subset(M, 0, 0)
    #> [1] 1
    

    I say this based on the R and C++ code posted, e.g. R gives 1 value and C++ has a return type permitting only one value.


    The code posted by OP is shown without the error. The initial error as compiled will indicate there is an issue using an Rcpp object inside of an arma class. If we correct the types, e.g. replacing Rcpp::IntegerVector with an arma appropriate type of either arma::ivec or arma::uvec, then compiling yields a more informative error message.

    Corrected Code:

    Rcpp::cppFunction("double element_from_subset(arma::mat D, int i, arma::uvec columns) { 
      return D.cols(columns).elem(i);
      }", depends = "RcppArmadillo")
    

    Error Message:

    file6cf4cef8267.cpp:10:26: error: no member named 'elem' in 'arma::subview_elem2<double, arma::Mat<unsigned int>, arma::Mat<unsigned int> >'
      return D.cols(columns).elem(i);
             ~~~~~~~~~~~~~~~ ^
    1 error generated.
    make: *** [file6cf4cef8267.o] Error 1
    

    So, there is no way to subset a subview that was created by taking the a subset from an armadillo object.


    You may want to read up on a few of the subsetting features of Armadillo. They are immensely helpful.

    Disclaimer: Both the first and second links I've contributed to or written.