Search code examples
rrcppna

Dealing with NA values using Rcpp


I'm testing a piece of my code, which is shown below:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericMatrix testOutMat(const int& ncols, const int& nrows, const NumericVector& col_prob){

  //Store row and column positions
  NumericVector col_pos = no_init(nrows);
  NumericVector row_pos = no_init(nrows);

  int row_val;
  int nz_counter=0;
  for(int j=0; j<ncols; ++j){ 
    for(int i=0; i<nrows; ++i){
      row_val = R::rbinom(1,col_prob[j]);
      Rcout << "i,j: " << i << "," << j << std::endl;
      Rcout << "val: " << row_val << std::endl;
      if(row_val==1){ //if (i,j)th entry is a 1, save location
        row_pos[i] = i;
        col_pos[i] = j;
        nz_counter += 1;
      } else{ //assign as NA
        row_pos[i] = NA_REAL;
        col_pos[i] = NA_REAL;
      }
      Rcout << "row_pos[i]: " << row_pos[i] << std::endl;
      Rcout << "col_pos[i]: " << col_pos[i] << std::endl;
      Rcout << "num non-zeros: " << nz_counter << std::endl;
    }
  }

  NumericMatrix out = no_init(nz_counter,2);

  Rcout << "Printing output matrix" << std::endl;
  for(int i=0; i<nz_counter; ++i){
    if(!Rcpp::NumericVector::is_na(row_pos[i])){ 
      out(i,0) = row_pos[i];
      out(i,1) = col_pos[i];
    }
    Rcout << "row_pos[i]: " << row_pos[i] << std::endl;
    Rcout << "col_pos[i]: " << col_pos[i] << std::endl; 
  }

  return out;
}

/*** R
set.seed(1)
res <- testOutMat(ncols=5,nrows=5,col_prob = runif(20, 0.1, 0.2))
*/

From the output, I have that the entries (i,j)={(0,0),(3,1)} are non-zero, so that res should be a 2x2 matrix with 0 0 in the first row and 3 1 in the second. However, I get something very different:

     [,1] [,2]
[1,]   64 1024
[2,]    1    4

I suspect that this is due to how I'm handling NAs. The overall goal of the function is to generate the row and column indices for non-zero elements (generated by the call to rbinom).

I've tried debugging this for some time now and I can't seem to get a fix.


Solution

  • The problem here is that you're writing over row_pos and col_pos over and over again (ncols times) without any kind of keeping track of the prior result. That, coupled with your no_init() use, is what's causing the end result you see. We can change your code just a bit to ensure that row_pos and col_pos don't get overwritten:

    #include <Rcpp.h>
    using namespace Rcpp;
    
    // [[Rcpp::export]]
    IntegerMatrix testOutMat(const int ncols, const int nrows,
                             const NumericVector& col_prob) {
    
        IntegerMatrix binomial_deviates(nrows, ncols);
        IntegerVector row_positions;
        IntegerVector col_positions;
        int nz_counter = 0;
    
        for ( int j = 0; j < ncols; ++j ) {
            binomial_deviates(_, j) = rbinom(nrows, 1, col_prob[j]);
            for ( int i = 0; i < nrows; ++i ) {
                if ( binomial_deviates(i, j) == 1 ) {
                    row_positions.push_back(i);
                    col_positions.push_back(j);
                    nz_counter += 1;
                }
            }
        }
    
        IntegerMatrix out(nz_counter, 2);
    
        for ( int i = 0; i < nz_counter; ++i ) {
            out(i, 0) = row_positions[i];
            out(i, 1) = col_positions[i];
        }
    
        return out;
    }
    
    /*** R
    set.seed(1)
    res <- testOutMat(ncols=5,nrows=5,col_prob = runif(20, 0.1, 0.2))
    */
    

    Result:

    > set.seed(1)
    
    > res <- testOutMat(ncols=5,nrows=5,col_prob = runif(20, 0.1, 0.2))
    > res
         [,1] [,2]
    [1,]    0    0
    [2,]    3    1