Search code examples
rsum

Obtaining different results from sum() and '+'


Below is my experiment:

> xx = 293.62882204364098
> yy = 0.086783439604999998
> print(xx + yy, 20)
[1] 293.71560548324595175
> print(sum(c(xx,yy)), 20)
[1] 293.71560548324600859

It is strange to me that sum() and + giving different results when both are applied to the same numbers.

Is this result expected?

How can I get the same result?

Which one is most efficient?


Solution

  • There is an r-devel thread here that includes some detailed description of the implementation. In particular, from Tomas Kalibera:

    R uses long double type for the accumulator (on platforms where it is available). This is also mentioned in ?sum: "Where possible extended-precision accumulators are used, typically well supported with C99 and newer, but possibly platform-dependent."

    This would imply that sum() is more accurate, although this comes with a giant flashing warning sign that if this level of accuracy is important to you, you should be very worried about the implementation of your calculations [in terms both of algorithms and underlying numerical implementations].

    I answered a question here where I eventually figured out (after some false starts) that the difference between + and sum() is due to the use of extended precision for sum().

    This code shows that the sums of individual elements (as in sum(xx,yy) are added together with + (in C), whereas this code is used to sum the individual components; line 154 (LDOUBLE s=0.0) shows that the accumulator is stored in extended precision (if available).

    I believe that @JonSpring's timing results are probably explained (but would be happy to be corrected) by (1) sum(xx,yy) will have more processing, type-checking etc. than +; (2) sum(c(xx,yy)) will be slightly slower than sum(xx,yy) because it works in extended precision.