python numpy scipy sparse-matrix matrix-multiplication

Numpy/Scipy Sparse vs dense multiplication

There appears to be some discrepancy between the scipy sparse matrix types and the normal numpy matrix type

import scipy.sparse as sp
A = sp.dia_matrix(tri(3,4))
vec = array([1,2,3,4])

print A * vec                        #array([ 1.,  3.,  6.])

print A * (mat(vec).T)               #matrix([[ 1.],
                                     #        [ 3.],
                                     #        [ 6.]])

print A.todense() * vec              #ValueError: matrices are not aligned

print A.todense() * (mat(vec).T)     #matrix([[ 1.],
                                     #        [ 3.],
                                     #        [ 6.]])

Why can sparse matrices work out that an array should be interpreted as a column vector when normal matrices cannot?

Solution

In the spmatrix class (which you can check in scipy/sparse/base.py) the __mul__() there is a set of "ifs" that can answer your question:

class spmatrix(object):
    ...
    def __mul__(self, other):
        ...
        M,N = self.shape
        if other.__class__ is np.ndarray:
            # Fast path for the most common case
            if other.shape == (N,):
                return self._mul_vector(other)
            elif other.shape == (N, 1):
                return self._mul_vector(other.ravel()).reshape(M, 1)
            elif other.ndim == 2  and other.shape[0] == N:
                return self._mul_multivector(other)

For a 1D array it will always go to _mul_vector() from compressed.py, inside class _cs_matrix, with the code given below:

def _mul_vector(self, other):
    M,N = self.shape

    # output array
    result = np.zeros(M, dtype=upcast_char(self.dtype.char,
                                           other.dtype.char))

    # csr_matvec or csc_matvec
    fn = getattr(sparsetools,self.format + '_matvec')
    fn(M, N, self.indptr, self.indices, self.data, other, result)

    return result

Note that it assumes the output with the number of lines of the sparse matrix. Basically, it treats your input 1D array as fitting to the number of columns of the sparse array (there is not tranpose or non-tranpose). But for a ndarray with ndim==2 it cannot do such assumption, so that if you tried:

vec = np.array([[1,2,3,4],
                [1,2,3,4]])

A * vec.T would be the only option that works.

For a 1D matrix the sparse module also does not assume that it fits the number of columns. To check that you can try:

A * mat(vec)
#ValueError: dimension mismatch

And A * mat(vec).T would be your only choice.