unexpected outcome caused by scipy.csr_matrix

>>> import numpy as np
>>> from scipy.sparse import *
>>> x1 = np.eye(3, dtype=float)
>>> x2 = csr_matrix(x1, dtype=float, shape =x1.shape)
>>> assert x2.todense().any()==x1.any()  ## holds true
>>> w = np.ones((3,1))
>>> dw1 = w - x1[:,0]
>>> dw2 = w - x2[:,0]

which gives me

>>> print dw1
[[ 0.  1.  1.]
 [ 0.  1.  1.]
 [ 0.  1.  1.]]

while

>>> print dw2
[[ 0.]
 [ 1.]
 [ 1.]]

My question is why dw1 and dw2 differ? Should they defer, Is it a bug? Many Thanks!

Solution

This is a slicing/indexing issue. The questionable line here is

w - x1[:, 0]

which has nothing to do with sparseness. You have sliced x1, obtaining a 1D array. When this gets subtracted from w, numpy broadcasts this array back into a 3 by 3 matrix (because it equates the number of columns of both terms), which I guess is not what you wanted.

It looks like you just wanted the submatrix consisting of the first column of x1. This would be

w - x1[:, [0]]

returning

array([[ 0.],
       [ 1.],
       [ 1.]])

consistent with the other result.

In case of a sparse matrix, you automatically get a submatrix (not a 1D array) because indexing works differently for those.