Below is my experiment:
> xx = 293.62882204364098
> yy = 0.086783439604999998
> print(xx + yy, 20)
[1] 293.71560548324595175
> print(sum(c(xx,yy)), 20)
[1] 293.71560548324600859
It is strange to me that sum()
and +
giving different results when both are applied to the same numbers.
Is this result expected?
How can I get the same result?
Which one is most efficient?
There is an r-devel thread here that includes some detailed description of the implementation. In particular, from Tomas Kalibera:
R uses long double type for the accumulator (on platforms where it is available). This is also mentioned in ?sum: "Where possible extended-precision accumulators are used, typically well supported with C99 and newer, but possibly platform-dependent."
This would imply that sum()
is more accurate, although this comes with a giant flashing warning sign that if this level of accuracy is important to you, you should be very worried about the implementation of your calculations [in terms both of algorithms and underlying numerical implementations].
I answered a question here where I eventually figured out (after some false starts) that the difference between +
and sum()
is due to the use of extended precision for sum()
.
This code shows that the sums of individual elements (as in sum(xx,yy)
are added together with +
(in C), whereas this code is used to sum the individual components; line 154 (LDOUBLE s=0.0
) shows that the accumulator is stored in extended precision (if available).
I believe that @JonSpring's timing results are probably explained (but would be happy to be corrected) by (1) sum(xx,yy)
will have more processing, type-checking etc. than +
; (2) sum(c(xx,yy))
will be slightly slower than sum(xx,yy)
because it works in extended precision.