Search code examples
pythonnumpysparse-matrixslicesubsampling

Slicing a sparse scipy matrix to subsample for every 10th row and column


I am trying to subsample a scipy sparse matrix as a numpy matrix like this to get every 10th row and every 10th column:

connections = sparse.csr_matrix((data,(node1_index,node2_index)),
                                shape=(dimensions,dimensions))
connections_sampled = np.zeros((dimensions/10, dimensions/10))
connections_sampled = connections[::10,::10]

However, when I run this and and query the shape of connections_sampled, I get the original dimensions of connections instead of dimensions that have been reduced by a factor of 10.

Does this type of subsampling now work with sparse matrices? It seems to work when I use smaller matrices, but I can't get this to give the correct answer.


Solution

  • You cannot sample every 10th row and column of a CSR matrix, not in Scipy 0.12 at least:

    >>> import scipy.sparse as sps
    >>> a = sps.rand(1000, 1000, format='csr')
    >>> a[::10, ::10]
    Traceback (most recent call last):
    ...    
    ValueError: slicing with step != 1 not supported
    

    You can do it, though, by converting first to a LIL format matrix:

    >>> a.tolil()[::10, ::10]
    <100x100 sparse matrix of type '<type 'numpy.float64'>'
        with 97 stored elements in LInked List format>
    

    As you see, the shape is updated correctly. If you want a numpy array, not a sparse matrix, try:

    >>> a.tolil()[::10, ::10].A
    array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
           [ 0.,  0.,  0., ...,  0.,  0.,  0.],
           [ 0.,  0.,  0., ...,  0.,  0.,  0.],
           ..., 
           [ 0.,  0.,  0., ...,  0.,  0.,  0.],
           [ 0.,  0.,  0., ...,  0.,  0.,  0.],
           [ 0.,  0.,  0., ...,  0.,  0.,  0.]])