Search code examples
pythonnumpyscipysparse-matrixmatrix-multiplication

Numpy/Scipy Sparse vs dense multiplication


There appears to be some discrepancy between the scipy sparse matrix types and the normal numpy matrix type

import scipy.sparse as sp
A = sp.dia_matrix(tri(3,4))
vec = array([1,2,3,4])

print A * vec                        #array([ 1.,  3.,  6.])

print A * (mat(vec).T)               #matrix([[ 1.],
                                     #        [ 3.],
                                     #        [ 6.]])

print A.todense() * vec              #ValueError: matrices are not aligned

print A.todense() * (mat(vec).T)     #matrix([[ 1.],
                                     #        [ 3.],
                                     #        [ 6.]])

Why can sparse matrices work out that an array should be interpreted as a column vector when normal matrices cannot?


Solution

  • In the spmatrix class (which you can check in scipy/sparse/base.py) the __mul__() there is a set of "ifs" that can answer your question:

    class spmatrix(object):
        ...
        def __mul__(self, other):
            ...
            M,N = self.shape
            if other.__class__ is np.ndarray:
                # Fast path for the most common case
                if other.shape == (N,):
                    return self._mul_vector(other)
                elif other.shape == (N, 1):
                    return self._mul_vector(other.ravel()).reshape(M, 1)
                elif other.ndim == 2  and other.shape[0] == N:
                    return self._mul_multivector(other)
    

    For a 1D array it will always go to _mul_vector() from compressed.py, inside class _cs_matrix, with the code given below:

    def _mul_vector(self, other):
        M,N = self.shape
    
        # output array
        result = np.zeros(M, dtype=upcast_char(self.dtype.char,
                                               other.dtype.char))
    
        # csr_matvec or csc_matvec
        fn = getattr(sparsetools,self.format + '_matvec')
        fn(M, N, self.indptr, self.indices, self.data, other, result)
    
        return result
    

    Note that it assumes the output with the number of lines of the sparse matrix. Basically, it treats your input 1D array as fitting to the number of columns of the sparse array (there is not tranpose or non-tranpose). But for a ndarray with ndim==2 it cannot do such assumption, so that if you tried:

    vec = np.array([[1,2,3,4],
                    [1,2,3,4]])
    

    A * vec.T would be the only option that works.

    For a 1D matrix the sparse module also does not assume that it fits the number of columns. To check that you can try:

    A * mat(vec)
    #ValueError: dimension mismatch
    

    And A * mat(vec).T would be your only choice.