numpy numerical-methods floating-point-precision variance

Precision error in numpy.var

I am trying the following code for estimating the variance in a sample, and compare it to the numpy.var implementation.

import numpy as np

def rcov(xj, (i, Mi, Si)):
        j  = i + 1
        Mj = Mi + (xj - Mi) / j
        Sj = Si + (i/j) * (xj - Mi) ** 2
        return (j, Mj, Sj)

def mycov(X):
        s = (0., 0, 0)
        for i in xrange(len(X)):
                s = rcov(X[i], s)
        return s[-1] / (len(X) - 1) # sample covariance

X = np.random.rand(1000000)
N = 1e+15

print
print 'Sample (co)variance with 1 dof.'
print '-------------------------------'
print 'np.var(X)     ', np.var(X, ddof=1)
print 'mycov(X)      ', mycov(X)
print 'np.var(X+N)   ', np.var(X+N, ddof=1)
print 'mycov(X+N)    ', mycov(X+N)

And the output is

Sample (co)variance with 1 dof.
-------------------------------
np.var(X)      0.0833039106062
mycov(X)       0.0833039106062
np.var(X+N)    19208514.2744
mycov(X+N)     0.0859324294763

raising two questions:

would it help if I upgrade to numpy 1.8? I am currently on 1.6.1
numpy is a numerical library. why does it fail at a simple numerical problem?

Solution

For the sake of speed, NumPy does not check for arithmetic overflow or underflow. It's the user's responsibility to choose dtypes which are large enough to maintain the desired level of precision throughout all computations.

Using NumPy 1.8, and selecting X to be of dtype longdouble, rather than the default float64,

X = np.random.rand(1000000).astype('longdouble')

yields

Sample (co)variance with 1 dof.
-------------------------------
np.var(X)      0.0832200737104
mycov(X)       0.0832200737104
np.var(X+N)    0.0832200805148
mycov(X+N)     0.0832199500372