Search code examples

Problems with sum functions

Using other programming language to accelerate R is a big innovation, but seems hazardous as well. Any statistic function had to be stable to use, given the same result for the same entry such as sum, sd, cor... The problem I have is with using big amount of data (10^8 sample). I used RCPP and RCPPPARALLEL version of sum to demonstrate bugs for big data.

#include <Rcpp.h>
using namespace Rcpp;
 *           Rcpp inner function :sum
// [[Rcpp::export]]
double RCPPSUM(NumericVector x) {
  return sum(x);

/*** R

  options(digits=22)# showing more details in number  
  sourceCpp(system.file("tests/cpp/sum.cpp", package = "RcppParallel"))
##  Given two function parallelVectorSum  (RcppParallel) and vectorSum (C++STL)
##  We manipulated a very big vector
## Simple checking of stability
for(i in 1:100)v[i]<-sum(x)
####stable sd =0
  for(i in 1:100)v[i]<-vectorSum(x)
###stable sd =0
  for(i in 1:100)v[i]<-RCPPSUM(x)
###stable sd =0

  for(i in 1:100)v[i]<-parallelVectorSum(x)
##instable sd!=0

We can qualify parallel version as unstable, so it is excluded. Which result between RCPP and RSUM is really the truth?


  • Briefly:

    1. Conjecture alert: seems hazardous as well. Please back this up or remove it.
    2. Please show the courtesy of proper capitalization. We write this as Rcpp and RcppParallel, respectively. Though I guess you just refer to labels so it doesn't really matter.
    3. Using options(digits=22) is cute but useless. Double precision gives you around 16 digits. Just hoping for more does not give you more.