python-2.7 floating-point floating-accuracy floating-point-precision floating-point-conversion

Maintain precision when averaging floats

Let's say that I have a large number of floats, e.g. 100, and I need to calculate their average.

To get the most accurate result, should I sum all the numbers and then divide by 100?

Or should I divide each number by 100, and then sum all of them?

(If it matters, I'm coding in Python 2.)

Solution

I can answer this from a general perspective, rather than a Python perspective. The answer to your question depends on several factors, including the number of values and the range of the values.

You are correct that adding the numbers together can lead to bad results. This is called a numerically unstable algorithm. The problem occurs with floating point arithmetic. At some point x + 1 = x, because there is no representation for x + 1.

However, you probably don't have to worry about 100 numbers, unless they are quite large. This issue more typically arises when working with millions of numbers -- or you can get overflow problems with integer arithmetic.

Dividing by the total numbers is not necessarily a solution either, because you can have a problem in the other direction -- too small.

One approach that is more stable is to do an iterative calculation of the averages:

avg(1) = x1
avg(2) = avg(1) * (1/2) + x2 * (1/2)
avg(3) = avg(2) * (2/3) + x3 * (1/3)
. . .
avg(n) = avg(n - 1) * ((n - 1) / n) + (x(n) / n)

I should note that if your numbers have a very wide range, you can still have problems. This is also true when you have very large positive and negative numbers that can offset each other. Other methods may have to be used in this case; these often take into account the size and signs of the numbers.