Search code examples
rrcpp

Checking Null and NA of a vector in Rcpp


I'm trying to evaluate the sum of a vector (y) conditional on whether the values of a second nullable vector (r) is NA or not. If the second vector r is NULL, all of the values of y should be summed. If all elements of r is NA, function should return NA. Please see the end of the text for the desired output.

I tried the following code first:

library(Rcpp)
cppFunction('double foo(NumericVector y, Rcpp::Nullable<Rcpp::IntegerVector> r = R_NilValue) {
  double output = 0;
  bool return_na = !Rf_isNull(r);
  int y_count = y.size();
  for (int i = 0; i < y_count; i++) {
    if (Rf_isNull(r)  || !R_IsNA(r[i])) {
    //// if (Rf_isNull(r)  || !R_IsNA(as<IntegerVector>(r)[i])) {
      if (!Rf_isNull(r))
        Rcout << R_IsNA(as<IntegerVector>(r)[i]) << " - "<< as<IntegerVector>(r)[i] << std::endl;
      output = output + y[i];
      return_na = false;
    } 
  }
  if (return_na) 
    return NA_REAL;
  return output;
}')

This gave me the following error:

 error: invalid use of incomplete type 'struct SEXPREC'
     if (Rf_isNull(r)  || !R_IsNA(r[i])) {
                                     ^

In order to solve it, I used if (Rf_isNull(r) || !R_IsNA(as<IntegerVector>(r)[i])) { instead. But this time, when converting to an integer vector, NA values are converted to a number and R_IsNA() test gives a false positive.

Here is the expected output that I want.

foo(1:4, NULL) #  <- This should return 10 = 1 + 2 + 3 + 4
foo(1:4, c(1, 1, 1, 1)) #  <- This should return 10 = 1 + 2 + 3 + 4
foo(1:4, c(1, 1, NA, 1)) #  <- This should return 7 = 1 + 2 + 4
foo(1:4, c(NA, NA, NA, NA)) # <- This should return NA

How can I get the function that I want? (This example is simplified, I'm not particularly interested in sum function. Instead, I'm interested in checking NA and NULL simultaneously as given in the example.)


Solution

  • Three suggestions:

    • Use Rcpp instead of R's C API.
    • Return early when r is NULL.
    • Create a LogicalVector before looping through the input vector.
    #include <Rcpp.h>
    
    // [[Rcpp::export]]
    double foo(Rcpp::NumericVector y, Rcpp::Nullable<Rcpp::IntegerVector> r = R_NilValue) {
        if (r.isNull())
            return Rcpp::sum(y);
    
        Rcpp::LogicalVector mask = Rcpp::is_na(r.as());
        if (Rcpp::is_true(Rcpp::all(mask))) 
            return NA_REAL;
    
        double output = 0.0;
        int y_count = y.size();
        for (int i = 0; i < y_count; ++i) {
            if (!mask[i]) {
                output += y[i];
            } 
        }
        return output;
    }
    
    /***R
    foo(1:4, NULL) #  <- This should return 10 = 1 + 2 + 3 + 4
    foo(1:4, c(1, 1, 1, 1)) #  <- This should return 10 = 1 + 2 + 3 + 4
    foo(1:4, c(1, 1, NA, 1)) #  <- This should return 7 = 1 + 2 + 4
    foo(1:4, c(NA, NA, NA, NA)) # <- This should return NA
    */ 
    

    Result:

    > Rcpp::sourceCpp('60569482.cpp')
    
    > foo(1:4, NULL) #  <- This should return 10 = 1 + 2 + 3 + 4
    [1] 10
    
    > foo(1:4, c(1, 1, 1, 1)) #  <- This should return 10 = 1 + 2 + 3 + 4
    [1] 10
    
    > foo(1:4, c(1, 1, NA, 1)) #  <- This should return 7 = 1 + 2 + 4
    [1] 7
    
    > foo(1:4, c(NA, NA, NA, NA)) # <- This should return NA
    [1] NA
    

    Further suggestion:

    • Use the mask for sub-setting y.
    #include <Rcpp.h>
    
    // [[Rcpp::export]]
    double foo(Rcpp::NumericVector y, Rcpp::Nullable<Rcpp::IntegerVector> r = R_NilValue) {
        if (r.isNull())
            return Rcpp::sum(y);
    
        Rcpp::LogicalVector mask = Rcpp::is_na(r.as());
        if (Rcpp::is_true(Rcpp::all(mask))) 
            return NA_REAL;
    
        Rcpp::NumericVector tmp = y[!mask];
        return Rcpp::sum(tmp);
    }