Search code examples
rrcpp

Problem with automatically cast Logical vector to integer


R api allow to treat SEXP via pointer directly which simplify all treatement that depends to cast non original data type. For exemple , we can use unsigned int to treat SEXP with reel or integer type . The problem is that R gives the ability to cast automaticly from logical to integer SEXP. Internal R header define logical as C integer type causes -I think - non logic state. for exemple if I use this code:

// [[Rcpp::export]]
SEXP test(SEXP x){
  int* arr= INTEGER(x);
  arr[0]=77;
  return x;
} 

and I run in R:

x<-NA             ## by default NA is logical vector
is.logical(x)  ## return TRUE
test(x)            ## return TRUE 
is.logical(x)  ## return TRUE
print(x+0L )   ## normaly will return  TRUE but it gives 77
 max(x)          ## gives 77 !

Most basic fuction treat x as integer sum,max,min... The same problemes come with Rcpp witch block in-place exchange. For example:

// [[Rcpp::export]]
IntegerVector test1(IntegerVector x){
  x[0]=77;
  return x;
} 

using R :

x<-NA 
test1(x)  ## x still NA
x<-as.integer(x)
test1(x) ## edit to 77

Finally, is there a possibly to overcome this critical cast from logical to integer ?


Solution

  • A logical in R has the same bytes per element as an integer (4 bytes). This is different than C, where a bool has 1 byte* and an int has 4 bytes. The reason R does this is probably because in this approach, up-casting logical to integer is instantaneous and vector multiplication between logical and integer has no overhead.

    What you're doing in both cases is to access the pointer to the start of the vector and set the first 4 bytes to the value that would correspond to 77.

    On the R side, the variable named "x" still points to the same underlying data. But since you changed the underlying data, the value of the x data now has bytes that correspond to an int of 77.

    An int of 77 doesn't mean anything as a logical since it can't happen in basic operation. So really, what R does when you force an impossible value is basically unknown.

    A logical in R can only have three values: TRUE (corresponds to a value of 1), FALSE (corresponds to a value of 0) and NA (corresponds to a value of -2147483648).

    *(Technically, implementation defined but I've only seen it as 1 byte)