Search code examples
pythonscipysparse-matrix

indices of sparse_csc matrix are reversed after extracting some columns


I'm trying to extract columns of a scipy sparse column matrix, but the result is not stored as I'd expect. Here's what I mean:

In [77]: a = scipy.sparse.csc_matrix(np.ones([4, 5]))
In [78]: ind = np.array([True, True, False, False, False])
In [79]: b = a[:, ind]

In [80]: b.indices
Out[80]: array([3, 2, 1, 0, 3, 2, 1, 0], dtype=int32)

In [81]: a.indices
Out[81]: array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3], dtype=int32)

How come b.indices is not [0, 1, 2, 3, 0, 1, 2, 3] ? And since this behaviour is not the one I expect, is a[:, ind] not the correct way to extract columns from a csc matrix?


Solution

  • The indices are not sorted. You can either force the looping by reversing in a's rows, which is not that intuitive, or enforce sorted indices (you can also do it in-place, but I prefer casting). What I find funny is that the has_sorted_indices attribute does not always return a boolean, but mixes it with integer representation.

    a = scipy.sparse.csc_matrix(np.ones([4, 5]))
    ind = np.array([True, True, False, False, False])
    b = a[::-1, ind]
    b2 = a[:, ind]
    b3 = b2.sorted_indices()
    
    b.indices
    >>array([0, 1, 2, 3, 0, 1, 2, 3], dtype=int32)
    b.has_sorted_indices
    >>1
    b2.indices
    >>array([3, 2, 1, 0, 3, 2, 1, 0], dtype=int32)
    b2.has_sorted_indices
    >>0
    b3.indices
    array([0, 1, 2, 3, 0, 1, 2, 3], dtype=int32)
    b3.has_sorted_indices
    >>True