I've been playing around with using more efficient data structures and parallel processing and a few other things. I've made good progress getting a script from running in ~60 seconds down to running in about ~9 seconds.
The one thing I can't for the life of me get my head around though is writing a loop in Rcpp. Specifically, a loop that calculates line-by-line depending on previous-line results and updates the data as it goes.
Wondering if someone could convert my code into Rcpp that way I can back-engineer and figure out, with an example that I'm very familiar with, how its done.
It's a loop that calculates the result of 3 variables at each line. Line 1 has to be calculated separately, and then line 2 onwards calculates based on values from the current and previous lines.
This example code is just 6 lines long but my original code is many thousands:
temp <- matrix(c(0, 0, 0, 2.211, 2.345, 0, 0.8978, 1.0452, 1.1524, 0.4154,
0.7102, 0.8576, 0, 0, 0, 1.7956, 1.6348, 0,
rep(NA, 18)), ncol=6, nrow=6)
const1 <- 0.938
for (p in 1:nrow(temp)) {
if (p==1) {
temp[p, 4] <- max(min(temp[p, 2],
temp[p, 1]),
0)
temp[p, 5] <- max(temp[p, 3] + (0 - const1),
0)
temp[p, 6] <- temp[p, 1] - temp[p, 4] - temp[p, 5]
}
if (p>1) {
temp[p, 4] <- max(min(temp[p, 2],
temp[p, 1] + temp[p-1, 6]),
0)
temp[p, 5] <- max(temp[p, 3] + (temp[p-1, 6] - const1),
0)
temp[p, 6] <- temp[p-1, 6] + temp[p, 1] - temp[p, 4] - temp[p, 5]
}
}
Thanks in advance, hopefully this takes someone with Rcpp skills just a minute or two!
Here is an the sample Rcpp equivalent code:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix getResult(NumericMatrix x, double const1){
for (int p = 0; p < x.nrow(); p++){
if (p == 0){
x(p, 3) = std::max(std::min(x(p, 1), x(p, 0)), 0.0);
x(p, 4) = std::max(x(p, 2) + (0.0 - const1), 0.0);
x(p, 5) = x(p, 0) - x(p, 3) - x(p, 4);
}
if (p > 0){
x(p, 3) = std::max(std::min(x(p, 1), x(p, 0) + x(p - 1, 5)), 0.0);
x(p, 4) = std::max(x(p, 2) + (x(p - 1, 5) - const1), 0.0);
x(p, 5) = x(p - 1, 5) + x(p, 0) - x(p, 3) - x(p, 4);
}
}
return x;
}
A few notes:
Rcpp::sourceCpp("myCode.cpp")
in your session to compile it and make it available within the session.NumericMatrix
here to represent the matrix.std::max
and std::min
respectively. These functions require two common data types, i.e. if we do max(x, y)
, both x
and y
must be of the same type. Numeric matrix entries are double
(I believe), so you need to provide a double
; hence, the change from 0
(an int in C++) to 0.0
(a double)temp[1, 4]
to temp(0, 3)
Update: If x
was a list of vectors, here's an approach:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List getResult(List x, double const1){
// Create a new list from x called `res`
Rcpp::List res(x);
for (int p = 0; p < x.size(); p++){
// Initiate a NumericVector `curr` with the contents of `res[p]`
Rcpp::NumericVector curr(res[p]);
if (p == 0){
curr(3) = std::max(std::min(curr(1), curr(0)), 0.0);
curr(4) = std::max(curr(2) + (0.0 - const1), 0.0);
curr(5) = curr(0) - curr(3) - curr(4);
}
if (p > 0){
// Initiate a NumericVector `prev` with the contents of `res[p-1]`
Rcpp::NumericVector prev(res[p-1]);
curr(3) = std::max(std::min(curr(1), curr(0) + prev(5)), 0.0);
curr(4) = std::max(curr(2) + (prev(5) - const1), 0.0);
curr(5) = prev(5) + curr(0) - curr(3) - curr(4);
}
}
return x;
}