Search code examples
c++rperformancefor-looprcpp

Converting a R code into C++ for Rcpp implementation


I have this simple "for loop" written in R which I have to convert in to C++. Following is the reproducible example of R code:

# Parameters required 
a <- 1.8
b <- 1
time.dt <- 0.1
yp <- 40 
insp.int <- 7
ph <- 2000

dt <- seq(0,ph,time.dt) # Time sequence
MD.set <- c(seq(insp.int, ph, insp.int), ph) # Decision points to check and set next inspection date

# Initialization
cum.y <- rep(0,length = length(dt)) 
init.y <- 0
flag <- FALSE

# At each iteration, the following loop generates a gamma distributed random number and cum.y keeps taking cumulative sum
# The objective is to return a vector cum.y with a conditional cumulative sum of previous iteration 
# When dt[i] matches any values in MD.Set AND corresponding cum.y[i] is also >= yp it changes the flag to true (the last if)
# At the start of the loop it checks if dt[i] matches any values in MD.Set AND flag is also true. If yes, then cum.y is reset to 0. 

for (i in 2:length(dt)){
  if (dt[i] %in% MD.set && flag == TRUE){
    cum.y[i] <- 0
    init.y <- 0
    flag <- FALSE
    next
  } else {
    cum.y[i] <- init.y + rgamma(n = 1, shape = a*time.dt, scale = b)
    init.y <- cum.y[i]
    if (dt[i] %in% MD.set && cum.y[i] >= yp){
      flag <- TRUE
    }
  }
}

res <- cbind(dt, cum.y)

Having no prior experience with C++, I am running into numerous problems when trying to do so. I need to do this conversion only to be able to use it in Rcpp package in R. Because the code runs slowly in R, specially when time.dt becomes smaller and I am guessing C++ would do the job faster. Can you help with this?

UPDATE 2: This is my proposal for the conversion with the help in comments and answer. However, I am not sure what is the equivalent of next in C++. If I use continue it keeps going through the rest of the codes (and execute the ones after else. If I use break then it comes out of the loop after the condition is true.

NumericVector cumy(double a, double b, double timedt, NumericVector dt, NumericVector MDSet, double yp){
  bool flag = false;
  int n = dt.size();
  double total = 0;
  NumericVector out(n);
  unordered_set<int> sampleSet(MDSet.begin(), MDSet.end());
  
  for (int i = 0; i < n; ++i){
    if (sampleSet.find(dt[i]) != sampleSet.end() && flag == true){
      out[i] = 0;
      total = 0;
      flag = false;
      continue;
    } else {
      out[i] = total + rgamma(1, a*timedt, b)[0];
      total = out[i];
      if (sampleSet.find(dt[i]) != sampleSet.end() && out[i] >= yp){
        flag = true;
      }
    }
  }
  return out;
}

Solution

  • The error you are getting is simply because there's no automatic conversion from NumericVector to std::unordered_set<int>. You can fix this by doing:

    std::unordered_set<int> sampleSet( MDSet.begin(), MDSet.end() )
    

    This calls the unordered_set's constructor with beginning and end iterators of MDSet, which will populate the set with all values.

    There's also another issue in your code:

    if (sampleSet.find(dt[i]) == sampleSet.begin())
    

    This will only be true if dt[i] is found at the first element of sampleSet. From your r code, I'm assuming you are just checking to see if the value dt[i] is within sampleSet, in which case, you want:

    if (sampleSet.find(dt[i]) != sampleSet.end())
    

    In C++, STL find methods generally return an iterator, and when the value is not found, it returns the end iterator, so if find's return value is not end, then the value was found within the set.