I need a sparse matrix (I'm using Compressed Sparse Row Format (CSR) from scipy.sparse
) to do some computation. I have it in a form of (data, (row, col))
tuple. Unfortunately some of the rows and columns will be all equal zero and I would like to get rid of those zeros. Right now I have:
[In]:
from scipy.sparse import csr_matrix
aa = csr_matrix((1,2,3), ((0,2,2), (0,1,2))
aa.todense()
[Out]:
matrix([[1, 0, 0],
[0, 0, 0],
[0, 2, 3]], dtype=int64)
And I would like to have:
[Out]:
matrix([[1, 0, 0],
[0, 2, 3]], dtype=int64)
After using the method eliminate_zeros()
on the object I get None
:
[In]:
aa2 = csr_matrix.eliminate_zeros(aa)
type(aa2)
[Out]:
<class 'NoneType'>
Why does that method turn it into None?
Is there any other way to get a sparse matrix (doesn't have to be CSR) and get rid of empty rows/columns easily?
I'm using Python 3.4.0.
In CSR format it is relatively easy to get rid of the all-zero rows:
>>> import scipy.sparse as sps
>>> a = sps.csr_matrix([[1, 0, 0], [0, 0, 0], [0, 2, 3]])
>>> a.indptr
array([0, 1, 1, 3])
>>> mask = np.concatenate(([True], a.indptr[1:] != a.indptr[:-1]))
>>> mask # 1st occurrence of unique a.indptr entries
array([ True, True, False, True], dtype=bool)
>>> sps.csr_matrix((a.data, a.indices, a.indptr[mask])).A
array([[1, 0, 0],
[0, 2, 3]])
You could then convert your sparse array to CSC format, and the exact same trick will get rid of the all zero columns then.
I am not sure of how well will it perform, but the much more readable syntax:
>>> a[a.getnnz(axis=1) != 0][:, a.getnnz(axis=0) != 0].A
array([[1, 0, 0],
[0, 2, 3]])
also works.