Search code examples
python-3.xscipysparse-matrix

How to eliminate zeros in sparse matrix in (Python)?


I need a sparse matrix (I'm using Compressed Sparse Row Format (CSR) from scipy.sparse) to do some computation. I have it in a form of (data, (row, col)) tuple. Unfortunately some of the rows and columns will be all equal zero and I would like to get rid of those zeros. Right now I have:

[In]:
     from scipy.sparse import csr_matrix
     aa = csr_matrix((1,2,3), ((0,2,2), (0,1,2))
     aa.todense()
[Out]:
     matrix([[1, 0, 0],
             [0, 0, 0],
             [0, 2, 3]], dtype=int64)

And I would like to have:

[Out]:
    matrix([[1, 0, 0],
            [0, 2, 3]], dtype=int64)

After using the method eliminate_zeros() on the object I get None:

[In]:
     aa2 = csr_matrix.eliminate_zeros(aa)
     type(aa2)
[Out]:
     <class 'NoneType'>

Why does that method turn it into None?

Is there any other way to get a sparse matrix (doesn't have to be CSR) and get rid of empty rows/columns easily?

I'm using Python 3.4.0.


Solution

  • In CSR format it is relatively easy to get rid of the all-zero rows:

    >>> import scipy.sparse as sps
    >>> a = sps.csr_matrix([[1, 0, 0], [0, 0, 0], [0, 2, 3]])
    >>> a.indptr
    array([0, 1, 1, 3])
    >>> mask = np.concatenate(([True], a.indptr[1:] != a.indptr[:-1]))
    >>> mask  # 1st occurrence of unique a.indptr entries
    array([ True,  True, False,  True], dtype=bool)
    >>> sps.csr_matrix((a.data, a.indices, a.indptr[mask])).A
    array([[1, 0, 0],
           [0, 2, 3]])
    

    You could then convert your sparse array to CSC format, and the exact same trick will get rid of the all zero columns then.

    I am not sure of how well will it perform, but the much more readable syntax:

    >>> a[a.getnnz(axis=1) != 0][:, a.getnnz(axis=0) != 0].A
    array([[1, 0, 0],
           [0, 2, 3]])
    

    also works.