algorithm math floating-point deterministic

Can I deterministically sum a vector of arbitrarily arranged floating-point numbers?

Say I have a (possibly large) vector of floating-point numbers resulting from some black box process. Is it possible to calculate a bitwise reproducible summation of these numbers?

If the black boxprocess always produces the numbers in the same order, then a bitwise reproducible summation is easy: just sum them from left to right.

But if the numbers are generated in a random order, perhaps because they are returned and collected from asynchronous processes, then it is harder: I'd have to sort them numbers.

But what if there are even more numbers, perhaps distributed across different machines, so that it's undesirable to move them?

Is there still a way to deterministically sum them?

Solution

Overview: A Single-pass, Data-Preserving Method

Yes!

A method to do just this is described in the paper "Algorithms for Efficient Reproducible Floating Point Summation" by Ahrens, Demmel, and Nguyen (2020). I've included an implementation of it below.

Here we'll compare three algorithms:

Conventional left-to-right summation: this is fast, inaccurate, and non-reproducible with respect to permutations in the input order.
Kahan summation: this is slower, very accurate (essentially O(1) in the input size), and, while non-reproducible with respect to permutations in the input order, closer to reproducibility in a narrow sense.
Reproducible summation: this is somewhat slower than Kahan summation, can be quite accurate, and is bitwise reproducible.

Conventional summation is inaccurate because as the sum grows the least-significant digits of the numbers we add to it get silently dropped. Kahan summation overcomes this by keeping a running "compensation" sum which holds onto the least-significant digits. The reproducible summation uses a similar strategy to maintain accuracy but maintains several bins corresponding to different levels of compensation. These bins also appropriately truncate significance of the input data to obtain an accurate, reproducible sum.

First Test

To study the algorithms, we'll run a test on 1 million numbers each drawn uniformly from the range [-1000, 1000]. To demonstrate both the range of answers of the algorithms as well as their reproducibility, the input will be shuffled and summed 100 times.

A result I'll assert, which the code below demonstrates rigorously, is that for each data type studied the deterministic algorithm yields the same result for each of the 100 data orderings.

My implementation includes a class which reproducible accumulators values passed to it, but there are several ways to pass the input data. We'll experiment with three of them:

1ata tests a mode in which numbers are passed to the accumulator one at a time without prior knowledge of the maximum magnitude of the inputs
many tests a mode in which many numbers are passed to the accumulator, but without prior knowledge of the maximum magnitude of the input
manyc tests a mode in which many numbers are passed to the accumulator and in which we have prior knowledge of the maximum magnitude of the inputs

The results of the various algorithms are as follows:

Single-Precision Floats
==================================================================
Average Reproducible sum 1ata time   = 0.0115s (11.5x simple; 2.6x Kahan)
Average Reproducible sum many time   = 0.0047s (4.7x simple; 1.1x Kahan)
Average Reproducible sum manyc time  = 0.0036s (3.6x simple; 0.8x Kahan)
Average Kahan summation time         = 0.0044s
Average simple summation time        = 0.0010s
Reproducible Value                   = 767376.500000000 (0% diff vs Kahan)
Kahan long double accumulator value  = 767376.500000000
Error bound on RV                    = 15.542840004
Distinct Kahan (single accum) values = 1
Distinct Simple values               = 92


Double-Precision Floats
==================================================================
Average Reproducible sum 1ata time   = 0.0112s (10.5x simple; 2.6x Kahan)
Average Reproducible sum many time   = 0.0051s (4.7x simple; 1.2x Kahan)
Average Reproducible sum manyc time  = 0.0037s (3.4x simple; 0.8x Kahan)
Average simple summation time        = 0.0011s
Average Kahan summation time         = 0.0044s
Reproducible Value                   = 310844.70069914369378239 (0% diff vs Kahan)
Kahan long double accumulator value  = 310844.70069914369378239
Error bound on RV                    = 0.00000000048315059
Distinct Kahan (double accum) values = 2
Distinct Simple values               = 96

Second Test

Let's try a more challenging test than a uniform random! Instead, let's generate 1 million data points in the range [0, 2π) and use them to generate the y values of a sine wave. As before, we then shuffle these data points 100 times and see what kinds of outputs we get. The timing values are very similar to the above, so let's leave them out. This gives us:

Single-Precision Floats
==================================================================
Reproducible Value                   = 0.000000000
Kahan long double accumulator value  = -0.000000000
Error bound                          = 14.901161194
Distinct Kahan (single) values       = 96
Distinct Simple values               = 100
Kahan lower value                    = -0.000024229
Kahan upper value                    = 0.000025988
Simple lower value                   = -0.017674208
Simple upper value                   = 0.018800795
Kahan range                          = 0.000050217
Simple range                         = 0.036475003
Simple range / Kahan range           = 726

Double-Precision Floats
==================================================================
Reproducible Value                   = 0.00000000000002185
Kahan long double accumulator value  = 0.00000000000002185
Error bound                          = 0.00000000000000083
Distinct Kahan (double) values       = 93
Distinct Simple values               = 100
Kahan lower value                    = 0.00000000000000017
Kahan upper value                    = 0.00000000000006306
Simple lower value                   = -0.00000000004814216
Simple upper value                   = 0.00000000002462563
Kahan range                          = 0.00000000000006289
Simple range                         = 0.00000000007276779
Simple range / Kahan range           = 1157

Here we see that the reproducible value matches exactly the long-double Kahan summation value we use as a source of truth.

In contrast to our results on the previous, simpler test there are many distinct Kahan summation values (where the Kahan summation uses the same floating-point type as the data); using a more difficult dataset has destroyed the "partial determinism" we observed in Kahan summation on the simpler test, even though the range of values produced by Kahan summation is orders of magnitude smaller than those obtained via simple summation.

So in this instance reproducible summation gives better accuracy than Kahan summation and yields a single bitwise-reproducible result rather than ~100 different results.

Costs

Time. The reproducible summation takes ~3.5x longer than the simple summation and about the same amount of time as Kahan summation.

Memory. Using 3-fold binning requires 6*sizeof(floating_type) space for the accumulator. That is, 24 bytes for single-precision and 48 bytes for double-precision.

Additionally, a static lookup table is needed which can be shared between all accumulators of a given data type and folding. For 3-fold binning this table is 41 floats (164 bytes) or 103 doubles (824 bytes).

Discussion

So, what have we learned?

Standard summations give a wide range of results. We ran 100 tests and got 96 unique results for the double-precision types. This is clearly not reproducible and demonstrates well the problem we're trying to solve.

Kahan summation on the other hand produces very few distinct results (thus it is "more" deterministic) and takes approximately 4x as long as conventional summation.

The reproducible algorithm takes ~10x longer than simple summation if we know nothing about the numbers we'll be adding; however, if we know the maximum magnitude of our summands then the reproducible algorithm only takes ~3.5x longer than simple summing.

In comparison to Kahan summation, the reproducible algorithm sometimes takes less time (!) while ensuring that we get answers which are bitwise identical. Further, unlike the method detailed in this answer, the reproducible result matches the Kahan summation result for both single and double precision in our simple test. We also get strong guarantees that the reproducible and Kahan results have a very low relative error.

What folding level is best? 3, per the graph below.

Fold-1 is very bad. For this test, 2 gives good accuracy for double, but a 10% relative error for float. Fold-3 gives good accuracy for double and float with a small time increase. So we only want fold-2 for One At A Time reproducible summation on doubles. Going higher than Fold-3 gives only marginal benefits.

Code

The full code is too long to include here; however, you can find the C++ version here; and ReproBLAS here.