Search code examples
pythonscipysparse-matrix

Sparse Matrix Multiplication Issue with Python


I am trying to take the dot product of a sparse matrix and its transpose. I'm using the scipy.sparse library and finding that the results are incorrect. See below:

import numpy as np
import scipy.sparse 

#Define the dense matrix
matrix_dense = np.zeros([100000,10])
for i in range(10):
    i_0 = i*10000
    i_1 = (i+1)*10000
    matrix_dense[i_0:i_1,i] = 1

#Define the sparse matrix
cols = []
for i in range(10):
    cols+=[i]*10000

dtype = np.uint8 
rows = range(len(cols)) 
data_csc = np.ones(len(cols), dtype=dtype)
matrix_sparse = scipy.sparse.csc_matrix((data_csc, (rows, cols)), shape=(len(cols), 10), dtype=dtype)

#Check that the two matrices are identical
assert np.abs(matrix_sparse.todense() - matrix_dense).max() == 0 

#Dot product of the dense matrix
dense_product = np.dot(matrix_dense.T,matrix_dense)

#Dot product of the sparse matrix
sparse_product = (matrix_sparse.T)*(matrix_sparse)

The correct answer (given by dense_product) should be a diagonal matrix, where the diagonal terms equal 10,000.

print dense_product
[[ 10000.      0.      0.      0.      0.      0.      0.      0.      0.
   0.]
 [     0.  10000.      0.      0.      0.      0.      0.      0.      0.
   0.]
 [     0.      0.  10000.      0.      0.      0.      0.      0.      0.
   0.]
 [     0.      0.      0.  10000.      0.      0.      0.      0.      0.
   0.]
 [     0.      0.      0.      0.  10000.      0.      0.      0.      0.
   0.]
 [     0.      0.      0.      0.      0.  10000.      0.      0.      0.
   0.]
 [     0.      0.      0.      0.      0.      0.  10000.      0.      0.
   0.]
 [     0.      0.      0.      0.      0.      0.      0.  10000.      0.
   0.]
 [     0.      0.      0.      0.      0.      0.      0.      0.  10000.
   0.]
 [     0.      0.      0.      0.      0.      0.      0.      0.      0.
   10000.]]

However, no matter how I compute the sparse matrix, the result is incorrect:

print sparse_product.todense()
[[16  0  0  0  0  0  0  0  0  0]
 [ 0 16  0  0  0  0  0  0  0  0]
 [ 0  0 16  0  0  0  0  0  0  0]
 [ 0  0  0 16  0  0  0  0  0  0]
 [ 0  0  0  0 16  0  0  0  0  0]
 [ 0  0  0  0  0 16  0  0  0  0]
 [ 0  0  0  0  0  0 16  0  0  0]
 [ 0  0  0  0  0  0  0 16  0  0]
 [ 0  0  0  0  0  0  0  0 16  0]
 [ 0  0  0  0  0  0  0  0  0 16]]

I've tried differently ways of performing the sparse dot product and get the exact same answer:

sparse_product_1 = np.dot(matrix_sparse.T,matrix_sparse)
sparse_product_2 = (matrix_sparse.T).dot(matrix_sparse)
sparse_product_3 = scipy.sparse.csr_matrix.dot((matrix_sparse.T), 
matrix_sparse)

Any idea whats going on?


Solution

  • It looks like you are using your data type of uint8, which has a max value of 256, and presumably you are overflowing, and ending up with 10000%256 which gives you 16.

    Here's an example of what is happening:

    x = np.array(10000, dtype = np.uint8)
    x
    array(16, dtype=uint8)
    

    Changing your dtype to np.int64 works for me:

    dtype = np.int64