I am currently trying to use numpy as well a scipy in order to handle sparse matrices, but, in the process of evaluating sparsity of a matrix, I had trouble, and I don't know how the following behaviour should be understood:
import numpy as np
import scipy.sparse as sp
a=sp.csc.csc_matrix(np.ones((3,3)))
a
np.count_nonzero(a)
When evaluating a, and non zero count, using the above code, I saw this output in ipython:
Out[9]: <3x3 sparse matrix of type '' with 9 stored elements in Compressed Sparse Column format>
Out[10]: 1
I think there is something I don't understand here. A 3*3 matrix full of 1, should have 9 non-zero term, and this is the answer I get if I use the toarray method from scipy. I may be using numpy and scipy the wrong way ?
The nonzero count is available as an attribute:
In [295]: a=sparse.csr_matrix(np.arange(9).reshape(3,3))
In [296]: a
Out[296]:
<3x3 sparse matrix of type '<class 'numpy.int32'>'
with 8 stored elements in Compressed Sparse Row format>
In [297]: a.nnz
Out[297]: 8
As Warren commented, you can't count on numpy
functions working on sparse
. Use sparse
functions and methods. Sometimes numpy
functions are written in a way that invokes the arrays own method, in which the function call might work. But that is true only on a case by case basis.
In Ipython
I make heavy use of the a.<tab>
to get a list of completions (attributes and methods). I also use the function??
to look at the code.
In the case of np.count_nonzero
I see no code - it is compiled, and only works on np.ndarray
objects.
np.nonzero(a)
works. Look at its code, and see that it looks for the array's method: nonzero = a.nonzero
The sparse nonzero method code is:
def nonzero(self):
...
# convert to COOrdinate format
A = self.tocoo()
nz_mask = A.data != 0
return (A.row[nz_mask],A.col[nz_mask])
The A.data !=0
line is there because it is possible to construct a matrix with 0 data elements, particularly if you use the coo
(data,(i,j))
format. So apart from that caution, the nnz
attribute gives a reliable count.
Doing a.<tab>
I also see a.getnnz
and a.eleminate_zeros
methods, which may be helpful if you are worried about sneaky zeros.
Sometimes it is useful to work directly with the data attributes of a sparse matrix. It's safer to access them than to modify them. But each sparse format has different attributes. In the csr
case you can do:
In [306]: np.count_nonzero(a.data)
Out[306]: 8