Why multiplication functions of scipy sparse and numpy arrays give different results?

I have two matrices in Python 2.7: one dense A_dense and the another sparse matrix A_sparse. I am interested in computing element-wise multiplication followed by sum. There are two ways to do it: use numpy's multiplication or scipy sparse multiplication. I expect them to give exactly same result with difference in execution time. But I find that they give different results for certain matrix sizes.

import numpy as np
from scipy import sparse
L=2000
np.random.seed(2)
rand_x=np.random.rand(L)
A_sparse_init=np.diag(rand_x, -1)+np.diag(rand_x, 1)
A_sparse=sparse.csr_matrix(A_sparse_init)
A_dense=np.random.rand(L+1,L+1)
print np.sum(A_sparse.multiply(A_dense))-np.sum(np.multiply(A_dense[A_sparse.nonzero()], A_sparse.data))

Output:

1.1368683772161603e-13

If I choose L=2001, then output is:

0.0

To check the size dependence of the difference using two different multiplication method, I wrote:

L=100
np.random.seed(2)
N_loop=100
multiply_diff_arr=np.zeros(N_loop)
for i in xrange(N_loop):
    rand_x=np.random.rand(L)
    A_sparse_init=np.diag(rand_x, -1)+np.diag(rand_x, 1)
    A_sparse=sparse.csr_matrix(A_sparse_init)
    A_dense=np.random.rand(L+1,L+1)
    multiply_diff_arr[i]=np.sum(A_sparse.multiply(A_dense))-np.sum(np.multiply(A_dense[A_sparse.nonzero()], A_sparse.data))
    L+=1

I got the following plot:

Can anyone help me understand what's happening? Don't we expect the difference between two methods to be at least 1e-18 rather than 1e-13?

Solution

I don't have a full answer, but this might help find the answer:

Under the hood, scipy.sparse will convert to coo format and do this:

ret = self.tocoo()
if self.shape == other.shape:
    data = np.multiply(ret.data, other[ret.row, ret.col])

The question is then why these two operations give different results:

ret = A_sparse.tocoo()
c = np.multiply(ret.data, A_dense[ret.row, ret.col])
ret.data = c.view(type=np.ndarray)

c.sum() - ret.sum()

-1.1368683772161603e-13

Edit:

The difference stems from different defaults on which axis to add.reduce first. E.g.:

A_sparse.multiply(A_dense).sum(axis=1).sum()
A_sparse.multiply(A_dense).sum(axis=0).sum()

Numpy defaults to 0 first.