Search code examples
c++rrcpprules

check boolean expression in dataframe Rcpp (C++)


I have a dataframe dat with data and a vector rule with logical rules

set.seed(124)
ro <- round(runif(n = 30,1,10),2)
dat <- as.data.frame(matrix(data =ro,ncol = 3)) ; colnames(dat) <- paste0("x" ,1:ncol(dat))
rule <- c("x1 > 5 & x2/2 > 2"  ,  "x1 > x2*2"  ,  "x3!=4")

I need to check if the expression is true

id <- 2
 for(i in 1:nrow(dat)){
   cr <- with(data = dat[i,] , expr = eval(parse(text = rule[id])))
   print(cr)
 }
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] FALSE
[1] TRUE
[1] FALSE
[1] FALSE
[1] FALSE
[1] TRUE

How to do this with Rcpp ?


Solution

  • Two things worth stressing here are

    • you do not need a low over all rows as R is vectorized, and that already fast

    • you can sweep the rules over your data and return a result matrix

    Both of those are a one-liner:

    > res <- do.call(cbind, lapply(rule, \(r) with(dat, eval(parse(text=r)))))
    > res
           [,1]  [,2] [,3]
     [1,] FALSE FALSE TRUE
     [2,] FALSE FALSE TRUE
     [3,]  TRUE FALSE TRUE
     [4,] FALSE FALSE TRUE
     [5,] FALSE FALSE TRUE
     [6,] FALSE  TRUE TRUE
     [7,]  TRUE FALSE TRUE
     [8,]  TRUE FALSE TRUE
     [9,]  TRUE FALSE TRUE
    [10,] FALSE  TRUE TRUE
    > 
    

    (I used the R 4.1.* anonymous function there, you can relace \(r) with the standard function(r) as well.)

    As this is already vectorised it will be faster than your per-row call, and even if you did it with Rcpp if would not be (much) faster than already vectorised code.