Using other programming language to accelerate R is a big innovation, but seems hazardous as well. Any statistic function had to be stable to use, given the same result for the same entry such as sum
, sd
, cor
...
The problem I have is with using big amount of data (10^8 sample). I used RCPP and RCPPPARALLEL version of sum to demonstrate bugs for big data.
#include <Rcpp.h>
using namespace Rcpp;
/********************************************************
* Rcpp inner function :sum
*******************************************************/
// [[Rcpp::export]]
double RCPPSUM(NumericVector x) {
return sum(x);
}
/*** R
library("Rcpp")
library("RcppParallel")
options(digits=22)# showing more details in number
sourceCpp(system.file("tests/cpp/sum.cpp", package = "RcppParallel"))
## Given two function parallelVectorSum (RcppParallel) and vectorSum (C++STL)
## We manipulated a very big vector
x<-rnorm(100000000)
(RSUM<-sum(x))
(RCPP<-RCPPSUM(x))
(RCPPPARALLEL<-parallelVectorSum(x))
(STD<-vectorSum(x))
identical(RSUM,RCPP)
identical(STD,RCPP)
identical(RSUM,RCPPPARALLEL)
identical(RCPP,RCPPPARALLEL)
## Simple checking of stability
v<-0
for(i in 1:100)v[i]<-sum(x)
(mean(v))
(sd(v))
####stable sd =0
for(i in 1:100)v[i]<-vectorSum(x)
(mean(v))
(sd(v))
###stable sd =0
for(i in 1:100)v[i]<-RCPPSUM(x)
(mean(v))
(sd(v))
###stable sd =0
for(i in 1:100)v[i]<-parallelVectorSum(x)
(mean(v))
(sd(v))
##instable sd!=0
*/
We can qualify parallel version as unstable, so it is excluded. Which result between RCPP and RSUM is really the truth?
Briefly:
Rcpp
and RcppParallel
, respectively. Though I guess you just refer to labels so it doesn't really matter.options(digits=22)
is cute but useless. Double precision gives you around 16 digits. Just hoping for more does not give you more.