Search code examples
rrcpprcpparmadillo

Using sample() from within Rcpp


I have a matrix containing probabilities, with each of the four columns corresponding to a score (an integer in sequence from 0 to 4). I want to sample a single score for each row using the probabilities contained in that row as sampling weights. In rows where some columns do not contain probabilities (NAs instead), the sampling frame is limited to the columns (and their corresponding scores) which do (e.g. for a row with 0.45,0.55,NA,NA, either 0 or 1 would be sampled). However, I get this error (followed by several others), so how can I make it work?:

error: no matching function for call to 'as<Rcpp::IntegerVector>(Rcpp::Matrix<14>::Sub&)'
     score[i] = sample(scrs,1,true,as<IntegerVector>(probs));

Existing answers suggest RcppArmadillo is the solution but I can't get that to work either. If I add: require(RcppArmadillo) before the cppFunction and score[i] = Rcpp::RcppArmadillo::sample(scrs,1,true,probs); in place of the existing sample() statement, I get:

error: 'Rcpp::RcppArmadillo' has not been declared
     score[i] = Rcpp::RcppArmadillo::sample(scrs,1,true,probs);

Or if I also include, #include <RcppArmadilloExtensions/sample.h> at the top, I get:

fatal error: RcppArmadilloExtensions/sample.h: No such file or directory
   #include <RcppArmadilloExtensions/sample.h>

Reproducible code:

p.vals <- matrix(c(0.44892077,0.55107923,NA,NA,
                 0.37111195,0.62888805,NA,NA,
                 0.04461714,0.47764478,0.303590351,1.741477e-01,
                 0.91741642,0.07968127,0.002826406,7.589714e-05,
                 0.69330800,0.24355559,0.058340934,4.795468e-03,
                 0.43516823,0.43483784,0.120895859,9.098067e-03,
                 0.73680809,0.22595438,0.037237525,NA,
                 0.89569365,0.10142719,0.002879163,NA),nrow=8,ncol=4,byrow=TRUE)

step.vals <- c(1,1,3,3,3,3,2,2)

require(Rcpp)
cppFunction('IntegerVector scores_cpp(NumericMatrix p, IntegerVector steps){

  int prows = p.nrow();

  IntegerVector score(prows);
  
  for(int i=0;i<prows;i++){
    int step = steps[i];
    
    IntegerVector scrs = seq(0,step);
    
    int start = 0;
    int end = step;
    
    NumericMatrix::Sub probs = p(Range(i,i),Range(start,end));

    score[i] = sample(scrs,1,true,probs);
  }
  
  return score;
  
}')

test <- scores_cpp(p.vals,step.vals)
test

Note: the value of step.vals for each row is always equal to the number of columns containing probabilities in that row -1. So passing the step.values to the function may be extraneous.


Solution

  • Many thanks for the pointers. I also had some problems with indexing the matrix, so that part is changed, too. The following code works as intended (using sourceCpp()):

    // [[Rcpp::depends(RcppArmadillo)]]
    #include <RcppArmadillo.h>
    
    #include <RcppArmadilloExtensions/sample.h>
    
    using namespace Rcpp;
    
    // [[Rcpp::export]]
    
    IntegerVector scores_cpp(NumericMatrix p, IntegerVector steps){
      
      int prows = p.nrow();
      
      IntegerVector score(prows);
      
      for(int i=0;i<prows;i++){
        int step = steps[i];
        
        IntegerVector scrs = seq(0,step);
        
        NumericMatrix probs = p(Range(i,i),Range(0,step));
    
        IntegerVector sc = RcppArmadillo::sample(scrs,1,true,probs);
        score[i] = sc[0];
      }
      
      return score;
      
    }