Search code examples
pythonindexingnumpyscipysparse-matrix

Index a SciPy sparse matrix with an array of booleans


NumPy arrays can be indexed with an array of booleans to select the rows corresponding to True entries:

>>> X = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> rows = np.array([True,False,True])
>>> X[rows]
array([[1, 2, 3],
       [7, 8, 9]])
>>> X[np.logical_not(rows)]
array([[4, 5, 6]])

But this seems not possible with SciPy sparse matrices; the indices are taken as numeric ones, so False select row 0 and True selects row 1. How can I get the NumPy-like behavior?


Solution

  • You can use np.nonzero (or ndarray.nonzero) on your boolean array to get corresponding numerical indices, then use these to access the sparse matrix. Since "fancy indexing" on sparse matrices is quite limited compared to dense ndarrays, you need to unpack the rows tuple returned by nonzero and specify that you want to retrieve all columns using the : slice:

    >>> rows.nonzero()
    (array([0, 2]),)
    >>> indices = rows.nonzero()[0]
    >>> indices
    array([0, 2])
    >>> sparse[indices, :]
    <2x100 sparse matrix of type '<type 'numpy.float64'>'
            with 6 stored elements in LInked List format>