I am trying to subsample a scipy sparse matrix as a numpy matrix like this to get every 10th row and every 10th column:
connections = sparse.csr_matrix((data,(node1_index,node2_index)),
shape=(dimensions,dimensions))
connections_sampled = np.zeros((dimensions/10, dimensions/10))
connections_sampled = connections[::10,::10]
However, when I run this and and query the shape of connections_sampled, I get the original dimensions of connections instead of dimensions that have been reduced by a factor of 10.
Does this type of subsampling now work with sparse matrices? It seems to work when I use smaller matrices, but I can't get this to give the correct answer.
You cannot sample every 10th row and column of a CSR matrix, not in Scipy 0.12 at least:
>>> import scipy.sparse as sps
>>> a = sps.rand(1000, 1000, format='csr')
>>> a[::10, ::10]
Traceback (most recent call last):
...
ValueError: slicing with step != 1 not supported
You can do it, though, by converting first to a LIL format matrix:
>>> a.tolil()[::10, ::10]
<100x100 sparse matrix of type '<type 'numpy.float64'>'
with 97 stored elements in LInked List format>
As you see, the shape is updated correctly. If you want a numpy array, not a sparse matrix, try:
>>> a.tolil()[::10, ::10].A
array([[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
...,
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.],
[ 0., 0., 0., ..., 0., 0., 0.]])