Search code examples
floating-pointaverageprecisionmeanieee-754

When computing arithmetic means, what way of summing is more precise?


Given the task to compute the arithmetic mean of n IEEE 754 double precision floating point-numbers x0, x1, ..., xn - 1, is it more precise to do

(ksumi xi) / n

(i.e. first doing a Kahan-sum of all xi and then dividing by n) or

ksumi (xi / n)

(i.e. first dividing the xi by n and then Kahan-summing)?

My own tests (with uniformly distributed random numbers in [0, 1) and normal distributed numbers over the whole range of floating point numbers centered on 0 with σ = 1) have been inconclusive, showing that both are very precise, but my choice of test data might have been particularly poor.


Solution

  • Sum first, then divide. If you divide first and then sum in the general case, you introduce a rounding error proportional to the largest magnitude summand, which mostly defeats the point of Kahan summation (in the case of catastrophic cancellation, which is what you're guarding against, your result is the rounding error from the divide).

    Summing first does have somewhat greater risk of undue overflow; to handle that correctly, you would rescale by an exact power-of-two as needed to prevent overflow. However, this is quite rare, and never something you need to worry about with well-scaled data.

    Just to provide a concrete example: consider averaging the values 4503599627370496, -4503599627370498, and 2 in double-precision. Even using naive summation, you get the exactly correct result (0) if you sum and then divide. If you divide and then sum, the summation is exact (by Sterbenz' lemma) and yet the error is still large; the computed result is -0.08333333333333337 (this comes only from the rounding error in 4503599627370496/3; -4503599627370498/3 is exact).