How can I add floats together in different orders, and always get the same total?

Let's say I have three 32-bit floating point values, a, b, and c, such that (a + b) + c != a + (b + c). Is there a summation algorithm, perhaps similar to Kahan summation, that guarantees that these values can be summed in any order and always arrive at the exact same (fairly accurate) total? I'm looking for the general case (i.e. not a solution that only deals with 3 numbers).

Is arbitrary precision arithmetic the only way to go? I'm dealing with very large data sets, so I'd like to avoid the overhead of using arbitrary precision arithmetic if possible.

Thanks!

Solution

There's an interesting 'full-precision-summation' algorithm here, which guarantees that the final sum is independent of the order of the summands (recipe given in Python; but it shouldn't be too difficult to translate to other languages). Note that the recipe as given in that link isn't perfectly correct: the main accumulation loop is fine, but in the final step that converts the list of accumulated partial sums to a single floating-point result (the very last line of the msum recipe), one needs to be a little bit more careful than simply summing the partial sums in order to get a correctly-rounded result. See the comments below the recipe, and Python's implementation (linked below) for a way to fix this.

It does use a form of arbitrary-precision arithmetic to hold partial sums (the intermediate sums are represented as 'non-overlapping' sums of doubles), but may nevertheless be fast enough, especially when all the inputs are of roughly the same magnitude. And it always gives a correctly rounded result, so accuracy is as good as you could hope for and the final sum is independent of the order of the summands. It's based on this paper (Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates) by Jonathan Shewchuk.

Python uses this algorithm for its implementation of math.fsum, which does correctly-rounded order-independent summation; you can see the C implementation that Python uses here--- look for the math_fsum function.