Search code examples
pythonnumpyscipysparse-matrix

Python: Inconsistency between sparse matrix multiplication and numpy.dot()


Ubuntu16.04_64bit + Python3.5.2 + numpy1.13.3 + scipy1.0.0 I've got this problem when I'm dealing with the matrix multiplication between a scipy.sparse.csc.csc_matrix and an numpy.ndarray. I will give out an example here:

import numpy as np
import scipy.sparse

a = np.random.random(1000,1000)
b = np.random.random(1000,2000)
da = scipy.sparse.csc.csc_matrix(a)
db = scipy.sparse.csc.csc_matrix(b)

ab = a.dot(b)
dadb = da.dot(db)
dab = da.dot(b)

then the difference looks like this:

In [31]: np.sum(dadb.toarray() != ab)
Out[31]: 1869078

In [33]: np.sum(dab != dadb.toarray())
Out[33]: 0

In [34]: np.sum(dab != ab)
Out[34]: 1869078

Why? What makes the difference between them? What to do with it?


Solution

  • What you are seeing is typical of floating point arithmetic (for a great explanation, see What Every Computer Scientist Should Know About Floating-Point Arithmetic or the answers to Why Are Floating Point Numbers Inaccurate?). Unlike real arithmetic, the order of operations in floating point arithmetic will (slightly) change the results, because rounding errors accumulate in different ways. What this means is that different ways of computing the same result cannot be expected to agree exactly, but they will agree approximately.

    You can see this if you use np.allclose instead of using exact equality:

    >>> np.allclose(dab, ab)
    True
    
    >>> np.allclose(dadb.toarray(), ab)
    True
    

    In short, these operations are behaving as expected.